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(5?> t'aractcnsation ct presentation d une nouvclle 
sequence gcnomiquc specifiquc pour Ic tegument Lcs 
rcgionr reguJatnccs voi sines dc I'ADN on! egaJcment etc 
carat ten sees Lc peroxidase dc tegument est traduil sous 
U»rmc dc prolcinc prccur^cux dc 38 kD*. a 352 acidcs 
ammcN. rcnfermanl unc sequence- signal de 26 acidcs 
amines, cllc dortne. par chvage. unc protcinc de 35 kDa 
Lcs plantcs renfermant un allele Hp dominant 
occumulcnt dc grandes quantises de pcroxyda.se dans lcs 
cellules sabhers du subcpidcrrnc Lcs genotypes epep 
homo/ygoics rccessifs naccumulcnt pas dc peroxydasc 
dans ccs cellules cl lew part dans I'activite lotale dc la 
peroxidase du tegument sc trouvc scnsiblcmenl rcduitc 
Lcs sondes oVnvccs delADNcoude l'ADN gcnomiquc 
pcuvent scrvir a jccclcr lcs polymorphismes qui 
distingucnt lcs genotypes KpHp ct epep La 
co?*egregatK>n des poKrnorphisrncs dans unc pojnilation 
1**2 r fX)Vcll * nt dun crotscment dc plantcs HpHp et epep 
montrc que Ic locus Hp code la protcinc peroxidase Unc 
comparaison des alleles Hp ct cp rcvclc qu il manque 87 
bp dans Ic gene reccssf pour lc codon initial dc 
traduction L expression hctcrologuc ainsi que lcs 
vectcurs ct lcs holes utilises pour 1 expression dc la 
pcroxyd^sc du tegument soni egalemcnt presenlcs La 
region regulatncc dc l'ADN spccifiquc pour la scrnence 
petit servir a oontrolcr H expression i) dc certains «enes. 
commc ceux codant la resistance aux herbicides, n) dc 
protciacs virales du tegument, protegcant centre 
I infect um. in) de protcincs a inlcrcl commercial tp ex 
en pharmacic). iv) dc protcincs modifiant !a valeur 
nutntne. Ic gout ou Ic condiUonncmcnt des semenccs, 
cnfin. cllc pcut scrvir s v) cltmmcr biologiqucrncn! des 
insectcs ou des agents pathogencs (p ex B 
thunngimsis) 



(57) A novel seed coal specific l »eroxidesc genomic 
sequence is charpctenzed and presented Adjacent DNA 
regulatory regions have also been characteri/ed The 
seed coat peroxidase is translated as a 3S2 amino ,-cid 
precursor protein of 38 kDa comprising a 26 ammo acid 
signal sequence whuh when cleaved results in a 35 kDa 
protein Plants containing a dominant Hp allele 
accumulate large amounts of peroxidase in the hourglass 
cells of the suhcpidcrmis Homozygous recessive epep 
genotypes do not accumulate peroxidase in the hourglass 
cells and arc much reduced in totai seed coal peroxidase 
activity Probes denved from the cDNA. or genomic 
DNA can be used to detect polymorphisms that 
distinguished HpHp and epep genotypes Cosegrcgalion 
of the pt>lymorphisms in &i F2population from a cross of 
HpHp and epep plants shows that the Hp locus encodes 
the seed coat peroxidase protein Comparison of Hp and 
cp alleles indicates that the recessive gene lacks H7 bp of 
sequence encompassing the translation start codon The 
heterologous expression, as well as vectors and hosts to 
be used for the expression of the seed coat peroxidase, 
arc also disclosed The seed-specific DNA regulatory 
region may be used to control expression of genes of 
interest such as i) genes encoding herbicide resistance, or 
ii) biologic a! control :»f insects or pathogens (eg M 
thunngiensis). or in) viral coat proteins to protect against 
viral infections, or iv) proteins of wmmcrc;al interest 
(eg pharmaceutical), and v) proteins that alter the 
nutritive value, taste, or processing of seeds 
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ABSTRACT OF THE DISCLOSURE 
A novel seed coat specific peroxidase genomic sequence is characterized and 
presented. Adjacent DNA regulatory regioos have also been characterized. The seed 
coat peroxidase is translated as a 352 amino acid precursor protein of 38 kJDa 
comprising a 26 amino acid signal sequence which when cleaved results in a 35 kDa 
protein. Plants containing a dominant Ep allele accumulate large amounts of 
peroxidase in the hourglass cells of the subepidermis. Homozygous recessive epep 
genotypes do not accumulate peroxidase in the hourgbss cells and are much reduced 
in total seed coat peroxidase activity. Probes derived from the cDNA, or genomic 
DNA can be used to detect polymorphisms that distinguished EpEp and epep 
genotypes. Cosegregation of the polytno -*p isms in an F 2 population from a cross of 
tpEp and epep plants shows that tho Ep locus encodes the seed coat peroxidase 
protein. Comparison of Ep and ep alleles indicates that the recessive gene lacks 87 hp 
of sequence encompassing Lie translation start codon. The heterologous expression, 
as well as vectors and hosts to be used for the expression of the seed coat peroxidase, 
are also disclosed. The seed-specific DNA regulatory region may be used to control 

expression of genes of interest such as i) genes encoding herbicide resistance, or ii) 

v 

biological control of insects or pathogens (e.g. B. thuringiensis), or iii) viral coat 
proteins to protect against viral infections, or iv) proteins of commercial interest (e.g. 
pharmaceutical), and v) proteins that alter the nutritive value, taste, or processing of 
seeds. 
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Seed coat specific DNA regulatory region and peroxidase 
The present invention relates to a novel DNA molecule comprising a plant seed 
coat specific DNA regulatory region and a novel structural gene encoding a peroxidase. 
The seed-coat specific DNA regulatory region may also be used to control the 
expression of other genes of interest within the seed coat. 

BACKGROUND OF THE INVENTION 

Full citations for references appear at the end of the Examples section 

Peroxidases are enzymes catalyzing oxidative reactions that use HjO^ as an 
electron acceptor. These enzymes are widespread and occur ubiquitously in plants as 
isozymes that may be distinguished by iheir isoelectric points. Plant peroxidases 
contribute to the structural integrity of cell walls by functioning in lignin biosynthesis 
and suherization, and by forming covaleot cross-linkages between extension, cellulose, 
pectin and other cell wall constituents (Campa. 1991) Peroxidases are also associated 
with plant defence responses and resistance to pathogens (Bowles, 1990; 
Moerschbacher 1992). Soybeans contain 3 anionic isozymes of peroxidase with a 
minimum M, of 37 kDa (Sessa and Anderson, 1981). Recently one peroxidase 
roo^. localised wiihitt the seed coat of soybean, has been characterized with a M T 
of 37 kDa (GUlibn sod Graham, 1991). 
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In an analysis of soybean seeds, Buttery and Buzzell (1968) showed that the 
amount of peroxidase activity present in seed coats may vary substantially among 
different cultivars. The presence of a single dominant gene Ep causes a high seed coat 
peroxidase phenotype (Buzzell and Buttery, 1969). Homozygous recessive eptp plants 
are ~ 100-fold lower in seed coat peroxidase activity. This results from a reduction in 
the amount of peroxidase enzyme present, primarily in the hourglass cells of the 
subepidermis (Gijzen et al. % 1993). In plants carrying the Ep gene, peroxidase is 
heavily concentrated in tne hourglass cells (osteosclereids). These cells form a highly 
differentiated cell layer with thick, elongated seco-xiary walls and large intercellular 
spaces (Baker et al. % 1987). Hourglass cells develop between the epidermal 
mac rose lere ids and the underlying articulated parenchyma, and are a prominent feature 
of seed coat anatomy at full maturity. The cytoplasm exudes from the hourglass cells 
upon imbibition with water and a distinct peroxidase isozyme constitutes five to 10% 
of the total soluble protein in EpEp seed coats. It is not known why the hourglass cells 
accumulate large amounts of peroxidase, but the sheer abundance and relative purity 
of the enzyme in soybean seed coats is significant because peroxidases are versatile 
enzymes with many commercial and industrial applications. Studies of soybean seed 
coat peroxidase have shown this enzyme to have useful catalytic properties and a high 
degree of thermal stability even at extremes of pH (McEldoon et al. , 1995). These 
properties result in the preferred use of soybean peroxidase, over that of horseradish 
peroxidase, in diagnostic assays as an enzyme label for antigens, antibodies, 
oligonucleotide probes, and within fining techniques. Johnson et al report on the use 
of soybean peroxidase for the deinking of printed waste paper (U.S. 5,270,770; 
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December 6. 1994) and for the biccatalytic oxidatiou of primary ; Jco hols (U.S. 
5.391.488; February 13. 1996). Soybean peroxidase has also been used as a 
repjacexnent for chlorine in Che pulp a* paper industry, or as formaldehyde 
replacement (Freiberg, 1995). 

An anionic soybean peroxidase from seed coats has been purified (Gillikin and 
Graham, 1991). Tni, protcin ^ a pI of 41 and Mf Qf 3? ^ A ^ ^ 

bulk extraction of peroxidase from seed hulls of soybean using a freeze thaw technique 
has also been reported (U.S. 5.491.085. February 13. 1996. Pokara and Johnson) 

Ugrimini et a! (1987) disclose the cloning of a ubiquitous anionic peroxide 
in tobacco encoding a protein of M r of 36 IcDa. Has ^oxidase has also been over 
expressed in transgenic tobacco plants (Ugrimini et al 1990) and Maiiyakai discloses 
the expression of this gear in cotton (WO 95/08914), 



Huangpu et al (1995) reported the partial cloning of a soybean anionic seed coat 
peroxidase. The 1031 bp sequence contained an open reading frame of 849 bp 
encoding a 283 amino acid protein with . Mr of 30.577. The M, of this peroxidase is 
7 kD* less than what one would expect for a soybean seed coa, peroxidase as reported 
by GiUikin and Graham (1991) and possibly represents another peroxidase isozyme 
20 within the seed coat. 
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The upstream promoter sequences for two poplar peroxidases have been 
described by Osakabe et al (1995). A number of characteristic regulatory sites were 
identified from comparison of these sequences to existing promoter elements. 
Additionally, a cryptic promoter with apparent specificity for seed coat tissues was 
isolated from tobacco by a promote** trapping strategy (Fobert et al. 1994). The 
5 upstream regulatory sequences associated with the Ep gene in soybean are distinct from 
these and other previously characterized promoters. The soybean Ep promoter drives 
high-level expression in a cell and tissue specific manner. The peroxidase protein 
encoded by the Ep gene accumulates in the seed coat tissues, especially in the hour 
glass cells of the subepidermis. Minimal expression of the gene is detected in root 
10 tissues. 

One problem arising from the desired use of soybean seed coat peroxidase is 
that there is variability between soybean varieties regarding peroxidase production 
(Buttery and Buzzeil, 1986; Freiberg, 1995). Due to the commercial interest in the use 
1 5 of soybean seed coat peroxidase new methods of producing this enzyme are required. 
Therefore, the gene responsible for the expression of the 37 kDa isozyme in soybean 
seed coat was isolated and characterized. 

Furthermore, novel regulatory regions obtained from the genomic DNA of 
20 soybean seed coat peroxidase have been isolated and characterized and are useful in 
directing the expression of geces of interest in seed coat tissues. 
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SUMMARY OF THE INVENTION 

The present invention relates to a DNA molecule that encodes a soybean seed 
coat peroxidase and associated DNA regulatory regions. 

This invention also embraces isolated DNA molecules comprising the nucleotide 
sequence of either SEQ ED NO:l (the cDNA encoding soybean seed coat peroxidase) 
SEQ ED No:2 (the genomic sequence). 

This invention also provides for a chimeric DNA molecule comprising a seed 
coat-specific regulatory region having nucleotides 1-1532 of SEQ CO NO:2 and a gene 
of interest under control of this DNA regulatory region. Also included within this 
Invention are chimeric DNA molecules comprising genomic DNA sequences 
exemplified by nucleotides 412-1041, 1234-2263 or 2430-2691 of SEQ ED NO:2. 
Furthermore. 'Jiis invention is directed to isolated DNA molecules comprising at least 
1) 24 contiguous nucleotides selected from nucleotides 1-1532 of SEQ ID 
NO:2; 

: > 32 contiguous nucleotides selected from nucleotides 412-1041 of SEQ 
CD NO:2; 

3) 23 contiguous nucleotides selected from nucleotides 1234-2263 of SEQ 
ID NO:2; or 

4) 22 contiguous nucleotides selected from nucleotides 2430-2691 of SEQ 
ID NO:2. 
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Thc present invention also provides for vectors which comprise DNA molecules 
encoding soybean seed coat peroxidase. Such a construct may include the DNA 
regulatory region from SEQ ID NO:2, including nucleotides 1-1532, or at least 24 
contiguous nucleotides selected from nucleotides 1-1532 of SEQ ID NO:2 in 
conjunction with the seed coat peroxidase gene, or the seed coat peroxidase gene under 
5 the control of any suitable constitutive or inducible promoter of interest. 

This invention is also directed towards vectors which comprise a gene of 
interest placed under the control of a DNA regulatory element derived from the 
genomic sequence encoding soybean seed coat peroxidase. Such a regulatory element 

10 includes nucleotides 1-1532 of SEQ ID NO:2, or at least 24 contiguous nucleotides 
selected from nucleotides 1-1532 of SEQ ID NO:2. Elements comprising nucleotides 
412-1041, 1234-2263 or 2430-2691 of SEQ ID NO:2, or 32 contiguous nucleotides 
selected from nucleotides 412-1041 of SEQ ID NO:2, 23 contiguous nucleotides 
selected from nucleotides 1234-2263 of SEQ ED NO:2, or 22 contiguous nucleotides 

15 selected from nucleotides 2430-2691 of SEQ ID NO:2 may also be used. 

This invention also embraces prokaryotic and eukaryotic cells comprising the 
vectors identified above. Such cells may include bacterial, insect, mammalian, and 
plant cell cultures. 

20 

This invention also provides for transgenic plants comprising the seed coat 
peroxidase gene under control of constitutive or inducible promoters. Furthermore, 
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this invention also relates to transgenic plants comprising the DNA regulatory regions 
of nucleotides 1-1532 of SEQ ED NO:2 controlling a gene of interest, or comprising 
genes of interest in functional association with genomic DNA sequences exemplified 
by nucleotides 412-1041, 1234-2263 or 2430-2691 of SEQ CO NO:2. Also embraced 
by this invention are transgenic plants having regulatory regions comprising at least 24 
contiguous nucleotides selected from nucleotides 1-1532 of SEQ ID NO:2, 32 
contiguous nucleotides selected from nucleotides 412-1041 of SEQ ID NO:2, 23 
contiguous nucleotides selected from nucleotides 1234-2263 of SEQ CD NO:2, or 22 
contiguous nucleotides selected from nucleotides 2430-2691 of SEQ ID NO:2. 

This invention is also directed to a method for the production of soybean seed 
coat peroxidase in a host cell comprising: 

i) transforming the host cell with a vector comprising an oligonucleotide 
sequence that encodes soybean seed coat peroxidase; and 

ii) culturing the host cell under conditions to allow expression of the" 
soybean seed coat peroxidase. 

This invention also provides for a process for producing a heterologous gene 
of interest within seed coats of a transformed plant, comprising propagating a plant 
transformed with a vector comprising a gene of interest under the control of 
nucleotides 1-1532 of SEQ ED NO: 2. Furthermore, this invention embraces a process 
for producing a heterologous gene of interest within seed coats of a transformed plant, 
comprising propagating a plant transformed with a vector comprising a gene of interest 
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under the control of a regulatory region comprising at least 24 nucleotides selected 
from nucleotides 1-1532 of SEQ ID NO:2. 

Although the present invention is exemplified by a soybean seed coat peroxidase 
and adjacent DNA regulatory regions, in practice any gene of interest can be placed 
5 downstream from the DNA regulatory region for seed coat specific expression. 
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BRDEF DESCRIPTION OF THE DRAWINGS 

These and other features of the invention will become mo*r apparent from the 
following description in which reference is made to the appended drawings wherein: 

Figure 1 is the c^)NA and deduced amino acid sequence of soybean seed coat 
peroxidase. Nucleotides are numbered by assigning + 1 to the first base of the 
ATG start codon; amino acids are numbered by assigning + 1 to the N-terminal 
Gin residue after cleavage of the putative signal sequence. The N-terminal 
signal sequence, the region of the active site, and the heme-binding domain are 
underlined. The numerals I, II and III placed directly above single nucleotide 
gaps in the sequence indicate the three intron splice positions. The target site 
and direction of five different PCR primers arc shown with dotted lines above 
the nucleotide sequence. An asterisk (♦) marks the translation stop codon. 

Figure 2 is the genomic DNA sequence of the Soybean seed coat peroxidase. 

Figure 3 is a comparison of soybean seed coat peroxidase with other closely related 
plant peroxidases. The Genfiank accession numbers are provided next to the 
name of the plan! from which the peroxidase was isolated. The accession 
number for the soybean sequence is L78163. (A) A comparison of the nucleic 
acid sequences; (B) A comparison of the amino acid sequences. 
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Figure 4 is a restriction fragment length polymorphisms between EpEp and epep 
genotypes using the seed coat peroxidase cDNA as probe. Genomic DNA of 
soybean lines OX312 (epep) and OX347 (EpEp) was digested with restriction 
enzyme, separated by electrophoresis in a 0.5% agarose gel, transferred to 
nylon, and hybridized with ^-labelled cDNA encoding the seed coat 
peroxidase. The size of the hybridizing fragments was estimated by comparison 
to standards and is indicated on the right. 



Figure 5 exhibits the structure of the Ep Locus. A 17 kb fragment including the Ep 
locus is illustrated schematically. A 3.3 kb portion of the gene is enlarged and 
exons and inrrons are represented by shaded and open boxes, respectively. The 
final enlargement of the 5' region shows the location and DNA sequence 
around the 87 bp deletion occurring in the ep allele of soybean line OX312. 
Nucleotides are numbered by assigning -f 1 to the first base of the ATG start 
codon. 

Figiire 6 displays PCR analysis of EpEp and epep genotypes using primers derived 
from the seed coat peroxidase cDNA. Genomic DNA from soybean lines 
0X312 (epep) and OX347 (EpEp) was used as template for PCR analysis with 
four different primer sets. Amplification products were separated by 
electrophoresis through a 0.8% agarose gel and visualized under UV light after 
staining with ethidium bromide. Genotype and primer combinations are 
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indicated at the top of the figure. The size in base pairs of the amplified DNA 
fragments are indicated on the right. 

Figure 7 exhibits PCR analysis of an F2 population from a cross of EpEp and epep 
genotypes. Geoomic DNA was used as template for PCR analysis of the 
parents (?) and 30 F 2 individuals. The cross was derived from che soybean lines 
OX312 (epep) and OX347 (EpEp). Plants were sdf pollinated and seeds were 
collected and scored for seed coat peroxidase activity. The symbols (-) and ( + ) 
indicate low and high seed coat peroxidase activity, respectively. Primers 
prx9+ and prxlO- were used in the amplification reactions. Products were 
separated by electrophoresis through a 0.8% agarose gel and visualized under 
UV light after staining with ethidiura bromide. The migration of molecular 
markers and their corresponding size in kb is also shown (lanes M). 

Figure 8 displays PCR analysis of six different soybean cultivars with primers derived 
from the seed coat peroxidase cDNA sequence. Genomic DNA was used as 
template for PCR analysis of three EpEp cultivars and three epep cultivars. 
Primers used in the amplification reactions and the size of the DNA product is 
indicated on the left. Products were separated by electrophoresis through a 
0.8% agarose gel and visualized under UV light after staining with ethidhim 
bromide. 

(A) Forward and reverse primers are downstream from deletion 

(B) Forward primer anneals to site within deletion 
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(C) Primers span deletion 
Figure 9 shows rhe accumulation of peroxidase RNA in tissues of GEp and epep 
plants. Figure 9(A): A comparison of peroxidase transcript abundance 
in cultivars Harosoy 63 (Ep) or Marathon (ep). Seed and pod tissues 
were sampled at a late stage of development corresponding to a whole 
5 seed fresh weight of 250 rag. Root and leaf tissue was from six week 

old plants. Autoradiograph exposed for 96 h. Figure 9(B): 
Developmental expression of peroxidase in cultivar Harosoy 63 (Ep). 
Flowers were sampled immediately after opening. Seed coat tissues 
were sampled at four stages of development corresponding to a whole 
10 seed fresh weight of: lane 1, 50 mg; lane 2, 100 mg; lane 3, 200 mg; 

lane 4, 250 mg. Autoradiograph exposed for 20 h. 
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DESCRIPTION OF PREFERRED EMBODIMENT 

The presets invention is directed to a novel oligonucleotide sequence encoding 
a seed coat peroxidase and assoriarrri DNA regulatory regions. 

According to the present invention DNA sequences that are •substantially 
homologous* includes sequences that tit identified under conditions of high 
stridency. 'High stringency* refers to Southern hybridization conditions employing 
washes it 65 "C with 0.1 x SSC. 0.5 % SDS. 

By "Dh'A regulatory region' it is meant any region within a geM>mk sequence 
that has che propcjry of controlling the expression of a DNA sequence that is operabJy 
linked with the regu ttory region. Such regulatory regions may include promoter or 
rnhanrrT regions, and .ther regulatory elements recognized by one of skill in the art. 
A segment of the DNA i^gulatory region is exemplified in this invention, however, as 
is understood by one of sfcS in the an, this region may be used as a probe to identify 
surrounding regions involved in the refutation of advent DNA. and such surrounding 
regions are also tnrludcd within the scope of this invention. 

to the context of this disclosure, the term "promoter" or "promoter region* 
refers to a sequence of DNA, nsuaDy upstream (5) to the coding sequence of a 
«n*ctur»i gene, which cotstrob the expression of the coding region by providing the 
recognition for RN A polymeric and/or other fetors required for transcription to start 
at the correct site. 
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There are generally two types of promoters, inducible and constitutive. An 
"inducible promoter 1 * is a promoter that is capable of directly or indirectly activating 
transcription of one or more DNA sequences or genes in response to an inducer. In 
the absence of an inducer the DNA sequences or genes will not be transcribed. 
Typically the protein factor, that binds specifically to an inducible promoter to activate 
S transcription, is present in an inactive form which is then directly or indirectly 
converted to the active form by (he inducer. The inducer can be a chemical agent such 
as a protein, metabolite, growth regulator, herbicide or phenolic compound or a 
physiological stress imposed directly by beat, cold, salt, or toxic elements or indirectly 
through the action of a pathogen or disease agent such as a virus. A plant cell 
1 0 containing an inducible promoter may be exposed to an inducer by externally applying 
the inducer to the cell or plant such as by spraying, watering, heating or similar 
methods. 

By 'constitutive promoter" it is meant a promoter that directs the expression 
15 of a gene throughout the various parts of a plant and continuously throughout plant 
development. Examples of known constitutive promoters include those associate*' with 
the CaMV 35S transcript and Agrobacterium Ti plasmid nopaline synthase gene. 

The chimeric gene constructs of the present invention can further comprise a 
20 3' untranslated region. A 3' untranslated region refers to that portion of a gene 
comprising a DNA segment that contains a polyadenylation signal and any other 
regulatory signals capable of effecting mRNA processing or gene expression. The 
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polyadenylation signal is usually characterized by effecting the addition of polyadenylk 
acid tracks to the 3' end of the mRNA precursor. Polyadenylation signals are 
commonly recognized by the presence of homology to the canonical form 5 AATAAA- 
3* although variations are not uncommon. 

Examples of suitable 3* regions are the 3* transcribed non-translated regions 
co n ta inin g a potyadenylation signal of Agroboctcrvm tumour inducing (Ti> plasmid 
genes, such as the nopaline synthase (Afar gene) and plant genes such as the soybean 
storage protein genes and the small subunil of the ribulose-1, 5-bisphosphate 
carboxylase (ssRUBISCO) gene. The 3* untranslated region from the structural gene 
of the present construct can the r e fo re be used to construct chimeric genes for 
expression in plants. 

The chimeric gene construct of the present invention can also include further 
enhancers, either translation or transcription enhancers, as may be required. These 
enhancrr regions are well known to persons skilled in the art, and can include the ATG 
initiation codon and adjacent sequences. The initiation codon must be in phase with 
the reading frame of the coding sequence to ensure translation of the entire sequeve. 
The translation control signals and initiation codon* can be from a variety of origins, 
both natural and synthetic. Translation^ initiation regions may be provided from the 
source of the transcriptional initiation region, or from the structural ge». The 
sequence can also be derived from the promoter selected to express the gene, and can 
be specifically modified so as to increase translation of the mRNA. 
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To aid in identificatioQ of transformed plant cells, the constructs of this 
invention may be further manipulated to include plant selectable markers. Useful 
selectable markers include enzymes which provide for resistance to an antibiotic such 
as gentamycin. hygromycin, kanamycin, and the like. Similarly, enzymes providing 
for production of a compound idem by colour change such as GUS 

5 (^glucuronidase), or hiniincscencc. such «. -ciferase are useful. 

Also considered part of this invention are transgenic plants containing the 
chimeric gene construct of the present invention. Methods of regenerating whole 
plants from plant cells are known in the art, and the method of obtaining transformed 

10 and regenerated plants is not critical to this invention. In general, transformed plant 
cells are cultured in an appropriate medium, which may contain selective agents such 
as antibiotics, where selectable markers are used to facilitate identification of 
transformed plant cells. Once callus forms, shoot formation can be encouraged by 
employing the appropriate plant hormones in accordance with known methods and the 

15 shoots transferred to rooting medium for regeneration of plants. The plants may then 
be used to establish repetitive generations, either from seeds or using vegetative 
propagation techniques. 

The constructs of the present invention can be introduced into plant cells using 
20 Ti plasmids. Ri plasmids, plant virus vectors, direct DNA transformation, micro- 
injection, electropowtion, etc. For reviews of such techniques see for example 
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Weiss bach and Weiss bach (1988) and Gcicrsoa and Ccrey (1988). The present 
invention further includes a suitable vector comprising the chimeric geue construct. 

Buttery and Buzzeil (1968) showed that the amount of peroxidase activity 
present in seed coats may vary substantially among different cultivars. The presence 
of a single dominant gene Ep causes a high seed coat peroxidase phenotype (Buzzell 
and Buttery, 1959). Homozygous recessive epep plants are -100-fold lower in seed 
coat peroxidase activity. This results from a reduction in the amount of peroxidase 
enzyme present, primarily in the hourglass cells of the subepidermis (Gijzen et al. , 
1993). In plants carrying the Ep gene, peroxidase is heavily concentrated in the 
hourglass cells (osteosclereicb). These cells form a highly differentiated cell layer with 
thick, elongated secondary walls and large intercellular spaces (Baker et al. , 1987). 

Screening a seed coat cDNA library prepared from EpEp plants with a 
degenerate primer derived from the active site domain of plant peroxidase resulted in 
a high frequency of positive clones. Many of these clones encode identical cDNA 
molecules and indicate that the corresponding mRNA is *n abundant trawcript in 
developing seed coat tissues. The sequence of the cDNA is shown in Figure 1 . 

Previous studies on soybean seed coat peroxidase iolicited chat this enzyn* is 
heavily glycosylated and that carbohydrate contributes 18% of the nuss of the apo 
enzyme (Gray et al.. 1996). The seven potential glycosyianoa sites identified from the 
amino acid sequence of the teed cost peroxidase (Figure 1) would accommodate the 
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five or six N-linked glycosylation sites proposed by Gray et al (1996). The hcine- 
b lading domain encompasses residues Asp 161 to Phel71 and the acid-base catalysts 
region from Gly33 to Cys44. The two regions are highly conserved among plant 
peroxidases and are centred around functional histidine residues. His 169 and His40. 
There are eight conserved cysteine residues in the mat* ire protein tJiat provide for four 
di sulfide bridges found in other plant peroxidases and predicted from the crystal 
structure of peanut peroxidase (Welindcr, 1992; Schuller et al., 1996). Other 
conserved areas include residues Cy$91 to Alal05 and Vail 19 to Leu 12^ chat occur in 
or around helix D. The most divergent aspects of the seed coat peroxidase protein 
sequence are the carboxy- and amtDO-termiral regions. These sequences probably 
provide special targeting signals far the proper processing aid delivery of the peptide 
chaio. It is possible the cartoxy -terminal exteasioa of the seed coat peroxidase is 
removed al maturity, as has been shown for certain barley and horseradish peroxidases 
(Welinder, 1992). 

The molecular mass of the enzyme has been dctrnnined by denaturing gel 
electrophoresis to be 37 U5a (Sessa and Anderson. 1931; Gillikin and Graham, 1991) 
or 4? i Da (Gijzcn *t al.. 1993). Analysis by mass spectrometry indicated a mass of 
40 t 622 Da foi the apo-enzyme and 33 t 2M) Da after deglycosylation (Gray et al. 9 
1 996). These vat** are in good agreement with the tnasa of 35,377 Da calculated from 
the predicted amino acid sequence for the mfcime apo-protetn prior to glycosylation and 
0L»cr modifications. Huangpu ct al (1995) reported an anionic seed coat peroxidase 
having a M, of 30,577 Da and cbartcterized a partial cDNA encoding this protein. 
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This 1031 bp cDNA contained an open reading frame of 849 bp encoding a 283 amino 
acid protein. There are several differences between this reported sequence and the 
sequence of this invention that are manifest at the amino acid level (see Figuve 3 for 
sequence compare, m) The enzyme encoded by the gene reported by Huangpu et al 
is different from that of this invention as the peroxidase of this invention has a M r of 
5 35.377 Da. 



Genomic DNA blots probed with the seed coat peroxidase cDNA produced two 
or three hybridizing fragments of varying intensity with most restriction enzjme 
diges tions, despite that several peroxidase isozymes are pirsent in soybean. The results 
1 0 indicate that this seed coat peroxidase is present as a single gene that does not share 
sufficient homology with most other peroxidase genes to anneal under conditions of 
high stringency. 



The genomic DNA sequence comprises four exons spanning bp 1533-1752 
(exon 0. 2383 -2574 (exon 2). 3605-3769 (exon 3) and 4033-1516 (exon 4) and three 
introns apprising 1752-2382 (inarm 1), 2575-3604 (intron 2) and 3770-4516 ^ Ttron 
3). ofSEQIDNO:2. Features of the upstream regulatory region of the genomic DN A 
include a TATA box centred on bp 1487; a cap signal 32 bp down stream centred on 
bp 1520. Also noted within the genomic sequence are three polyadenylation signals 
centred on bp 4520, 4598. 4663 and a polyadenylation she at bp 4700. 



20 
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This promoter is considered seed coat specific since the peroxidase protein 
encoded by the Ep gene accumulates in the seed coat tissues, especially in the hourglass 
cells of the subepidennis, and is not expressed in other tissues, aside from a marginal 
expression of peroxidase in the root tissues. This is also true at the transcriptional 
level (see Figure 9). The DNA regulatory regions of the genomic sequence of Figure 

5 2 are v-sed to control the expression of the adjacent peroxidase gene in seed coat tissue. 
Such regulatory regions include nucleotides 1-1532. Other regions of interest include 
nucleotides 1752-2382, 2575-3604 and/or 377CM032 of SEQ ID NO:2. Therefore 
other proteins of interest may be expressed in seed coat tissues by placing a gene 
capable of expressing the protein of intrrest under the control of the DNA regulatory 

1 0 elements of this invention- Genes of interest include but are not restricted to herbicide 
resistant genes, genes encoding viral coat proteins, or genes encoding proteins 
conferring biological control of pest or pathogens such as an insecticidal protein for 
example B. thuringiensis toxin. Other genes include those capable of the production 
of proteins that alter the taste of the seed and/or that affect the nutritive value of the 

1 5 soybean. 



A modified DNA regulatory sequence may be obtained by introducing changes 
into the natural scqtrace. Such modifications can be done through techniques known 
to one of skill in the art such as site-directed mutagenesis, reducing the length of the 
20 regulatory region using eudonncleases or exooucleft*-*, increasing the length through 
the insertion of linkers or other sequences of interest. Reducing the size cf DNA 
regulatory region may be achieved by removing 3' or 5* regions of the regulatory 
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region o f the natural sequence by using a endonuclease such as BAL 31 (Sambrook et 
al 1989). However, any such DNA regulatory region must still function as a seed coat 
specific DNA regulatory region. 



It may be readily determined if such modified DNA regulatory elements are 
capable of acting in a seed coat specific maimer transforming plant cells with such 
regulatory elements controlling the expression of a suitable marker gene, culturing 
these plants and determining the expression of the marker gene within the seed coat as 
outlined above. One may also analyze the efficacy of DNA regulatory elements by 
introducing constructs comprising a DNA regulatory element of interest operably 
linked with an appropriate marker into seed coat cissues by using panicle bombardment 
directed to seed coat tissue and detennining the degree of expression of the regulatory 
region as is known to one of skill in the an. 

Two tandemfy arranged genes encoding anionic peroxidase expressed in stems 
of Poputus kttakamiensis, prxA3a and prxA4a have been cloned and characterized 
(Osakabe et al. 1995). Both of these genomic sequences contained four exons and 
three introns and encoded proteins of 347 and 343 amino acids, respectively. The two 
genes encode distinct isozymes with deduced M,s of 33.9 and 34.6 kDa. 
Furthermore, a 532 bp promoter derived from the peroxidase gene of Armoracia 
rusticana has also been reported (Toyobo KK, JP 4,126,088, April 27. 1992). 
However, a search using GenBank revealed no substantial similarity between the 
promoter region, or introns 1. 2 and 3 of this invention and those within the literature 
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Digestion of the genomic DNA with BamHl or Sad revealed restriction 
fragment length polymorphisms that distinguished EpEp and epep genotypes. Although 
the Xbal digestion did not produce a readily detectable polymorphism, the size of the 
hybridizing fragment in both genotypes was ~ 14 kb. Thus, a 0.3 kb size difference is 
outside of the resolving power of the separation for fragments this large. Sequence 
5 analysis of EpEp and epep genotypes indicates that the mutant ep allele is missing 87 
bp of sequence at the 5' end of the structural gene. This would account for the 
drastically reduced amounts of peroxidase enzyme present in seed coats of epep plants 
since the deletion includes the translation start codon and the entire N -terminal signal 
sequence. However, the 87 bp deletion cannot account for the differences observed in 
10 the RFLP analysis since the missing fragment does not include a BamHl site and is 
much smaller than the 0.3 kb polymorphism detected in the Sacl digestion. Thus, 
other genetic rearrangements must occur in the vicinity of the ep locus that lead to 
these polymorphisms. 

15 The results shown here indicate that the mutation causing low seed coat 

peroxidase activity occurs in the structural gene encoding the enzyme. This mutation 
is an 87 bn deletion in the 5' region of the gene encompassing the translation start site. 
Several different low peroxidase cultivars share a similar mutation in the same area, 
suggesting that the recessive ep alleles have a common origin or that the region is 

20 prone to spontaneous deletions or rearrangements. 
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Due to the industrial interest in soybean seed coat peroxidase, alternate sources 
for the production of this enzyme are needed. The DNA of this invention, encoding 
the seed coat soybean peroxidase under the control of a suitable promoter and 
expressed within a host of interest, can be used for the preparation of recombinant 
soybean seed coat peroxidase enzyme. 

Soybean seed coat peroxidase has been characterized as a iignin-type peroxidase 
that has industrially si gn i fic a nt properties ie: high activity and stability under acidic 
conditions; exhibits wide substrate specificity; equivalent catalytic properties to that 
of Phanerochaete chrysosporium ligin peroxidase (the currently preferred eriTyioe used 
for treatment of industrial waste waters (Wick 1995) but L at least 150-fold more 
stable; more stable than horseradish peroxidase which is also used in industrial effluent 
treatments and medical diagnostic kits (McEIdoon et cL , 1995) These properties are 
useful within industrial applications for the degradation of naftiraJ aromatic polymers 
including lignin and coal (McEIdoon et al, 1995), and the preferred use of soybean 
peroxidase, over that of horseradish peroxidase, in medical diagnostic tests as an 
enzyme label for antigens, antibodies, oligonucleotide probes, and within staining 
techniques (Wick 1995). Soybean peroxidase is also used in the dcLcJring of printed 
waste paper (Johnson et al., U.S. 5,270.770; December 6, 1994) and for the 
biocaiaJytic oridation of primary alcohols (Johnson et al. f U.S. 5,391,488; February 
13. 1996). Soybean peroxidase has also been used as a replacement for chlorine in 
the pulp and paper industry, in order to remove chlorine, phenolic or aromatic amine 
containing pollutants from industrial waste waters (Wick 1993\ or as formaldehyde 
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replacement (Freiberg, 1995) for use in adhesives, abrasives, and protective coatings 
(e.g. varaisb and resins. Wick 1995). 

Furthermore, the seed coat peroxidase gene may be expressed in an organ or 
tissue specific manner within a plant. For example, the quality and strength of cotton 
fibber can be improved through the over-expression of cotton or horseradish peroxidase 
placed under the control of a fibre-specific promoter (Maiiyakal, WO 95/08914; April 
6, 1995). 

Similarly, seed-specific DNA regulatory regions of this invention may be used 
to control expression of genes of interest such as: 

i) genes encoding herbicide resistance, or 

ii) biological control of insects or pathogens (e.g. B. thuringiensis), or 

iii) viral coat proteins to protect against viral infections, or 

iv) proteins of commercial interest (e.g. pharmaceutical), and 

v) proteins that alter the nutritive value, taste, or processing cf seeds 
within the seed coat of plants. 

White this invention is described in detail with particular reference to preferred 
embodiments thereof, said embodiments are offered to illustrate but not to limit the 
invention. 
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EXAMPLES 

Plant material 

All soybean (Glycine max (L.J Men) cultivars and breeding lines were from the 
collection at Agriculture Canada, Harrow, Ontario. 

Seed Coal cDNA library Construction and Screening 

High seed coat peroxidase (£pEp) soybean cultivar Harosoy 63 plants were 
grown in field plots outdoors. Pods were harvested 35 days after flowering and seeds 
in the mid-to-late developmental stage were excised. The average fresh mass was 250 
mg per seed. Seed coats were dissected and immediately frozen in liquid nitrogen. The 
frozen tissue was lyophilized and total RNA extracted in 100 mM Tris-HC! pH 9.0. 
20 mM EDTA, 4% (w/v) sarkosyl, 200 mM NaCl, and 16 mM DTT, aud precipitated 
with LiC! using the standard phraol/cbJorofcnn method described by Wang and 
Vodkin (1994). The poly (A)* RNA was purified on c!igo<dT) cellulose columns prior 
to cDNA synthesis, size selection, ligation into the X. ZAP Express vector, and 
pacing. according !d insliijctjoin (Straiagcnc). A degenerate oligonucleotide with the 
5" to 3' sequence of TT(CTTX:A<C^A(OT>TG(r70'A '<CTK : T ^ y v Q a 
labelled to high specific activity and used as a probe t> isolate per : aJjsc : • c r 
(SambTOok et ol. , 1989). Duplicate plaque lifts wers uaoe tu try lv- r, (Amrzjbam). 
UV fixed, and prehybridizal at 36 °C for 3 h in 6 x SSC. 20 mM NajHPO, (pH6.8). 
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5 x Denhardt's, 0.4 % SDS, aad 500 Mg/mL salmon sperm DNA. Hybridization was 
in the same buffer, without Denhardt's, at 36 °C for 16 h. Filters were washed quickly 
with several changes of 6 x SSC and 0.1 % SDS, first at room temperature and finally 
at 40°C, prior to autoradiography for 16 h at -70°C with an intensifying screen. 

Genomic DNA Isolation, Library Construction, and DNA Blot Analysis 

Soybean genomic DNA was isolated from leaves of greenhouse grown plants 
or from etiolated seedlings grown in venniculite. Plant tissue was frozen in liquid 
nitrogen and lyophilized before extraction and purification of DNA according to the 
method of Dellaporta et aL (1983). Restriction enzyme digestion of 30 ^g DNA, 
separation on 0.5 % agarose gets and blotting to nylon membranes followed standard 
protoco' (Sambrook et aL, 1989). For construction of the genomic library, DNA 
purifit from Harosoy 63 leaf tissue was partially digested with BamHl and Urated into 
the k FDC II vector (Stratagene). Gigapack XL packaging extract (Stratagene) was used 
to select for inserts oi 9 to 22 kb. After library amplification, duplicate plaque lifts 
were hybridized to cDNA probe. 

Blots or filler lifts were prchybridized for 2 h at 65 °C in 6 x SSC, 5 x 
Denhardt's, 0.5 % SDS, and 100 Mg/mL salmon sperm DNA. Radiolabelled cDNA 
probe (20 to 50 ng) was prepared using the Ready-to-Go labelling kit (Pharmacia) and 
^P-dCTP (Amersham). Unincorporated P-dCTP was removed by spin column 
chromatography before adding radiolabelled cDNA to the hybridization buffer 
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(idectical to piehybodizaDoo buffer without Denturdt's) Hybridizitioti was for 20 h 
at 65 *C. Membranes were wished twice for 15 mm at room temperature with 2 x SSC, 
0.5 * SDS. followed by two 30 tain washes at 65*C with 0. 1 x SSC. 0.5 % SDS. 
Autoradiography was for 20 h at -70*C using an intensifying screen and X-OMAT film 
(Kodak). 

DNA Sequencing 

Secpsencuig of DNA was p cifotmul using dye-labelled terminators and Taq-FS 
DNA polymerase (Perkm-Enier) The PCR protocol consisted of 25 cycles of a 30 sec 
meU at 96*C. 15 sec a iwralrn g at 50*C, and 4 nun extension at 60°C. Samples were 
analyzed on an Applied Biosystcms 373A Stretch automated DNA sequencer. 

Polymerase Chain Reaction 

PCR amplification contained I ng template DNA, 5 pmol each primer. 1 .5 
mM MgClj. 0 15 mM deoxynodeotide triphosphates mix. 10 mM Tris-HCl, 50 mM 
KC1, pH 8.3. and 1 unit of Taq polymerase (CHbco BRL) in a total volume of 25 mL. 
Roctkm were palbnirt m ^ 

deoatmtkm at *4*C. there were 35 cycles of 1 min denamratkm at 94*C. 1 min 
mnratiTTg at 52*C, and 2 mm cii eu skm at 72*C. A final 7 nun extension at 72*C 
completed the program. The foOowfejg primers were used for PCR analysis of genomic 
DNA; 
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prx2 + 



CTTCCAAATATCAACTCAAT 



prx6- 



TAAAGTTGGAAAAGAAAGTA 



prx9 



ATGCATGCAGGTTTTTCAGT 



prxlO- 



TTGCTCGCTTTCTATTGTAT 



prxl2 + 



TCITCGATGCTTCTTTCACC 



prx29 + 



CATAAACAATACGTACGTGAT 



RNA Isolation 



10 



15 



20 
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For isolation ol RNA, tissue was harvested from greenhouse grown plants, 
dissected, frozen in liquid nitrogen, and lyophilized prior to extraction. Youl RNA was 
purified from seed coats, embryos, pods, leaves, and flowers using standard 
phenol/chiorofonn method (Satcbrook et aL, 198 »>. This method did no: afford good 
yields of RNA from roots, therefore this tissue was extracted with Triazole reagent 
(GibcoBRL) and total RNA purified according to manufacturers* instructions with an. 
additional phenol-chloroform extraction step. The amount of RNA was estimated by 
measuring absorbance at 260 and 280 ran, and by electrophorctic separation in 
formaldehyde gels followed by staining with ettudhuc bromr^c and comparison to 
known standards. Total RNA (10 yi% per sample) was prepared, subject to 
electrophoresis through a 1% agarose gel containing formaldehyde, and toca stained 
with ethidhon bromide to ensure equal loading cf samples. The gel was blotted to 
nylon (HybooTTf, Amersham) according to standard metbuds and the RNA was fixed 
to the membrane by UV crass liokfaqg. 
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Seed Coat Peroxidase Assays 



The F, seed was measured for peroxidase activity to score the phenotype of the 
F, population because the seed testa is derived from maternal tissue. The seeds were 
briefly soaked in water and the seed coat was disserted from the embryo and placed in 
a vial. Ten drops (-500 tiL) of 0.5% guaiaccl *as added and thr sample was left to 
stand for 10 min before adding one drop (-50 M L) of 0.1% An immediate 

change in colour of me solution, from dear to red, indicates a positive result and high 
seed coat peroxidase activity. 



Example 1: 77ie Seed Coat Peroxidase cDNA and genomic DNA 



sequences 



To isolate the seed coal peroxidase transcript, a cDNA library was constructed 
from developing seed coat tissue of the EpEp cultivar Harosoy 63. The primary 
library contained 10- recombinant plaque forming eons and was amplified prior- to 
screening. A degenerate 17-mer oligonucleotide corresponding to the conserved active 
site domain of plant peroxidases was used to probe tLe library. In screwing 10.000 
plaque forming units, 12 positive denes were identified. The cDNA insert size of the 
clones ranged from 0.5 to 2.5 kb, but six ciones shared a common insert size of 1.3 
kb. These six clones (soyprx03, soyprx05. soyprx06. soyprxll, soyorxU, and 
soyprxl4) were chosen for furthff characterizatkw tfce 1.3 kb insert size matched 
the expected peroxidase transcript size. Sapience analysis of the six clones showed that 
they contained identical cDNA transcripts encoding • peroxidase and that each resulted 
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from an independent cloning event since the junction between the cloning vector and 
the transcript was different in all cases. 

Since it was not clear thai the entire 5' end of the rDNA transcript was 
complete in any of the cDNA clones isolated the structural gene coi responding -x> the 
5 seed coat peroxidase was isolated from a Harosoy 63 genomic library. A partial BamHl 
digest of genomic DNA wa3 used to conrtnict the library and more than 10* plaque 
forming units were screened using the cDNA probe. A positive clone, G25-2-1-M, 
containing a 17 kb insert was identified and a 4.7 kb region encoding the peroxidase 
was sequetraJ SEQ ID NO:2. Thir regkm includes 1532 nucleotides of the 5 region 
10 of the peroxidase gene. 

The genomic scqucrcc matched ;bc cDNA sequence except for three introns 
encoded within tbe gene. The genrmic scquerce also revealed tv/o additional 
translation start codons, beginning one bp and 10 bp upstream from tin 5 ; end of the 

15 longest cDNA transcript iso^ted. Figure 1 shows the deduced cDNA sequence. Ton 
open reading frame of 1056 bp encodes a 352 amino acid protein of 38 1 106 D.\. A 
hrme binding domain, a peroxidase active site signature sequence, and seven potential 
N-gJyco*ytation sties wen? identified from the deduced amino acid sequent* The first 
26 amino acid residues conform to a membrane spanning domain. Cleavage of this 

20 putative signal sequence rclear.es a mature protein o* 326 residues with a mass of 
35 ,377 Da and an estimated pi of 4.4. 
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Relevant features of the genomic fragment (Figure 2) include four exoos at bp 
i^-411 (exon 1; 1533-1751 of SEQ ID NO:2). 1042 -1233 (exon 2; 2383*2574 of 
SEQ ID NO:2), 2263-2429 (exon 3; 4033-4516 fo SEQ ID NO:2) and 26*72-3174 
(exon 4; ! 752-2382 of SEQ ID NO:2) and three iatrons at bp 412-1041 (intion 1; 
1752-2382 cf SEQ ID NO:2). 1234-2263 (ixttxoo 2; 2575-3604 of SEQ ID NO:2) and 
2430-2691 (nitron 3; 377CMC32 of SEQ ID NO:2). The 1532 bp regulatory region of 
the genomic DNA include a TATA box centred on bp 1487 and a cap signal 32 bp 
dowu stream centred at bp 1520 of SEQ ID NO:2 Also mMert within the genomic 
sentence air three polyadenylation signals centred on bp 4520, 4598. 4700 and a 
polyadenylation site at bp 47(0 of SEQ ID NO:2. 

Figure 3 illustnues the relationship between the soybean seed coat peroxidase 
and otijer selected plant peroxidases. The soybean sequence is most closely related to 
four peroxidase cDNAs isolated from alfalfa, (see Figure 3) sharing from 65 to 67% 
identity at the amino acid level with the al&Ifa protein* (X90693, X90694. X90692, 
el-Turk et al 1996; L36156, Abrahams et al 1994). When compared with other plant 
peroxidases, soybean seed cost peroxidase exhibits from 60 to 65% identity with 
poplar (D30653 and D30652, Gsafcabe a al 1*94)) and flax (L0554, Omatm and Tyson 
1995); 50 to 60% klecsity with horseradish (M37J56, Fujiyama et al. 1988), tobacco 
(D11396. Osakabe et al 1993), and cucumber (M91373, Rasnnasen et al. 1992); and 
49% ideutity with barley (L36093, Scott-Craig et al. 1994), wheat (X85228, Baga et 
al 1995) and tobacco (L02124. Diaz-De-Lean et al 1993) peroxidase*. 
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A comparison of the promoter region. 1-1532 of 3EQ ED NO:2. indicates that 
there are no similar sequences present within the GENBANK database. 

Example 2: DNA Blot Analysis Using the Seed Coat Peroxidase cDNA Probe 

Reseats Reuriction Fragtrtent Length Polymorphisms Between EpEp and epep 
5 Gem)types 

Genomic DNA blots of OX347 (EpEp) and OX312 (epep) plants were 
hybridized with n P-lafcelled cDNA to estimate the copy number of the sc^d coat 
peroxidase gece and to determine if this locus is polymorphic between the two 

i 0 genotypes . Figure 4 shows the hybridization patterns after digestion with BamHl, Xbal y 
and Sari. Restriction fragment length polymorphisms are clearly visible in the BamHl 
and Sacl digestions. The BamHl digestion produced a strongly hybridizing 17 kb 
fragment and a faint 3.4 kb fragment in the EpEp genotype. The 3.4 kb BamHl 
fragment is visible in the epep genotype but the 17 kb fragment has been replaced by 

1 5 a signal at > 20 kb. The Sacl digestion resulted in detection of three fragments in EpEp 
and epep plants. At least two fragments were expected here since the cDNA sequence 
has a Sacl site within the open reading frame. However, the smallest and most strongly 
hybridizing of these fragments is 5.2 kb in EpEp plants and 4.9 kb in epep plants. 
Digestion with Xbal produced hybridizing fragments of -14 kb and 7.8 kb for both 

20 genotypes, with the larger fragment showing a stronger signal. 
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Example J: A Deletion Mutation Occurs in the Recessive ep Locus 

The structural gene encoding Che seed coat peroxidase L« schematically 
illustrated in Figure 5. The 17 kb BamHI fragment encompassing the gene includes 191 
bp of sequence upstream from the translation start codon, three introos of 631 bp, 1030 
bp, and 263 bp, and 13 kb of sequence downstream from the polyadenylation site. The 
arrangement of four exons and three introns and the placement of introns within the 
sequence is similar to that d xribed for other plant peroxidases (Simon, 1992; Osakabe 
etal. 1995). 

Primers were designed from the DNA sequence to compare EpEp and epep 
genotypes by PCR analysis. Figure 6 shows PCR amplification products from four 
different primer combinations using OX312 {epep) and OX347 (EpEp) genomic DNA 
as template. The primer anneal i ng site for prx29+ begins 182 bp upstream from the 
ATG star, codon; the remaining primer sites are shown in Figure 1 . Amplification with 
primers prx2+ and prx6- f and with prxl2+ and prxlO- produced the expected 
products of 1.9 kb and 860 bp, respectively, regardless of the Eplep genotype of the 
template DNA. However, PCR amplication with prinws prx9+ and prxlO-, and with 
prx29+ and prxlO- generated the expected products only when template DNA was 
from plants carrying the dominant Ep allele. When template DNA wis from an epep 
genotype, nc product was detected using primers prx9+ and prxlO- and a smaller 
product was amplified with primers prx29+ and prxlO-. The products resulting from 
amplification of OX312 or 0X347 template DNA with primers prx29+ and prxlO- 
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were directly sequenced and compared. The polymorphism is due to an 87 bp deletion 
occurring within this DNA fragment in OX312 plants, as shown in Figure 5. This 
deletion begins nine bp upstream from the translation start codon and includes 78 bp 
of sequence at the 5' end of the open reading frame, including the prx9 + primer 
annealing site. 

5 

To test whether this deletion mutation cosegrcgates with the seed coat 
peroxidase pbenotype, genomic DNA from an Fx population segregating at the Ep locus 
was amplified using primers prx9+ and prxlO- and F 3 seed was tested for seed coat 
peroxidase activity. Figure 7 shows the results from this analysis. Of the 30 F 2 
10 individuals tested, all 23 that were high in seed coat peroxidase activity produced the 
expected 860 bp PGR amplification product. The remaining seven F 2 's with low seed 
coat peroxidase activity produced no detectable PGR amplification products. 

Finally, to determine if die OX312<«p«p) and OX341(EpEp) breeding lines are 
1 5 representative "f soybean cultivars that differ in seed coat peroxidase activity, several 
cultivars were tested by PGk analysis using primer combinations targeted to the Ep 
locus. Figure 3 shows results from this analysis of six different soybean cultivars, three 
each of the homozygous dominant EpEp and recessive epep genotypes. As observed 
with OX312 and OX347 f amplification products of the expected size were produced 
20 with primers prxl2+ and prxlO- regardless of the genotype, whereaj epep genotypes 
yielded no product wih primers prxi?4 and prxlO- or a smaller fragment with primers 
prx29+ and prxlO-. 
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Example 4 Developmental Pattern of Expression of the Ep gent 

The seed coat peroxidase mRNA levels were determined by hybridizing RNA gel blocs 
with radio labelled cDNA probe. The figure illustrates the transcript abundance in 
various tissues of epep and EpEp plana. The mRNA accumulated to high levels in seed 
coat tissues of EpEp plants, especially in the later stages development when whole seed 
fresh weight exceeded 50 mg. Low levels of aanscript could also be detected in root 
tissues but not in the flower, embryo, pod or leaf. Tl»e tnmscript could also be detected 
in seed coat and root tissues epep plants but in drastically reduced amounts compared 
to the EpEp genotype. The reduced amounts of peroxidase mRNA present in seed coats 
of epep plants indicates that the transcriptional process and/or the stability of the 
resulting mRNA is severely affected. The Ep gene has a TATA box and a 5' cap signal 
beginning 47 bp and 15 bp. respectively, upstream from the translation start codon. 
The 87 bp deletion in the ep allele extends into the 5' cap signal and therefore could 
interfere wfch transcript processing. Regardless, any resulting transcript will not be 
properly translated since the AUO initiation codon and the entire amino-terminal 
signal sequence is deleted from the ep allele. Not wishing to be bound by theory, the 
lack of peroxidase accumulation in jecd coals of epep plants appears to be due to at 
least two factors, greatly reduced transcript levels and me<iectivt translation, resulting 
from mutation of the structural gene encoding the enzyme. In summary, the results 
indicate that the Ep gene regulatory elements can drivr high level expression in a 
tightly coordinated, tissue and oeveloprnentally spec if. c manner. 
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Ail scientific publications and patent documents are incorporated herein by 
reference. 

The present invention has been described with regard to preferred 
5 embodiments. However, it will be obvious to persons skilled in the art that a number 
of variations and modifications can be made without departing from the scope of the 
invention as described in the following claims. 
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SMQCWBCZ LISTING 

(A) HAM: Ntrfc Ol)««o 

(ft) STUDCT; PrlactM Jbmt 

<C) CITT: London 

(D) STAT*: Qnetrlo 

IS) Ctxamt: Pan+da 

tr> POSTAL COOl (SIP) ; V5» UN 

(11) TITLE Of twnriOM: M Qmc OA tagul.tory S^loo tod 
P*roxid**« 

<iii) tfUMBBX Of StQCIVCBS: 2 

(xv) OCMPUIU "SftTAtfE FOCEM : 

(A) MEDIUM TTO: floppy 41sk 

(B> centrum: tm PC CO»p«tibl% 
(o opoatiw ststw; pc-dos/ns-oos 

(01 SOrsWM: Patttt la Mum #1.0, v«r.ioo |] .)o (SPO) 



(3) rUPJKMATICV KSt ESQ ID ED: 1; 

25 

U) LODIS: 1J44 t**« pairs 
(•) TTW: aucltlc *eld 

(c) rtiMBnosi: 

30 (D) OTOtOOT: iW 
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(11 ) HOLICUU TYPI: cCJCA 



(ill) HYPOTHETICAL: HO 



(iv) AjrrX-8K*S3t WO 



(ix) PKATURB: 

(A) NAMS/KBY: CBS 

(B) LOCATION: 1 . . 105* 



<Lx> FSATOU : 

(A? lUm/OY: #ig_p*ptida 
<B) LOCATION: 1. .77 



(xl) SEQOXWC1 D1SCBIPTXCW: SBQ ID WO: 1: 

ATO OQT TCC ATO OCT CXA TTA OTA OTG OCA TTO TTO TOT GO TTT CCT 
Wet Oly Sar net Arg La*i Lrj Val val Ala Lau Lau Cym Ala l?hm Ala 
IS 10 15 

ATO CAT OCA OCJT TTT TGA OTC TCT TAT OCT C*C CTT ACT CCT ACQ TTC 
Mat Hia Ala Oly Pfa* S*r Val Sar lyr Ala OU* Lau Thr Pro Thr Pha 
20 25 10 

TAC AOA OAA ACA TTT OCA AAX CIO TTC CCT ATT OTO TTT OCZA OTA ATC 
Tyr Arg Olu Thr Cym Pro Am Uu Pha Pro Ha Val Pba Oly Val Ha 
35 40 45 



TTC CAT OCT TCT TTC ACC OAT OCC COA ATC 000 OCC ACT CTC ATC M3Q 



CA 822M8I8 19t7-S9- |* 



-43 - 

Pbe Asp Ale S«r Phe Thr Asp Pro Arg lie Oly Ale Ser Leu Met Arg 

SO 55 SO 

CTT CAT TTT CAT GAT TOC TTT Orr CAA OOT TOT OAT OOA TCA OTT TTO 240 
Leu Oie Phe His Asp Cy* Phe Vel Oln Oly Cy» Asp Oly Ser Vel Leu 
5 «5 70 75 SO 

CTO AAC AAC ACT OAT ACA ATA OAA AOC OM CAA GAT OCA CTT CCA AAT 288 
Leu Asa Asa Thr Asp Thx lie Olu Ser Olu Oln Asp Ale Leu Pro Asa 
85 90 95 

10 

ATC AAC TCA ATA AOA OOA TTO OAC OTT OTC AAT OAC ATC AAO ACA OCT 336 
He Asa Ser lie Arg Oly Leu Asp Vel Vel Asn Asp tie Lye Thx Ale 

100 105 110 

IS OTO 3AA AAT AOT TOT CCA OAC ACA OTT TCT TCT OCT OAT ATT CTT X7T 384 
v*l Olu Asn Ser Cye Pro Asp Thr Vel Ser Cys Ale Asp He Lsu Ale 

115 120 12S 

ATT OCA OCT OAA ATA OCT TCT OTT CTO OOA OOA OOT CCA OOA TOO CCA 432 
20 He Ale Ale Olu He Ale Ser Vel Leu Oly Oly Oly Pro Oly Trp Pro 
130 135 140 

OTT CCA TTA OOA AOA AOO OAC AOC TEA ACA OCA AAC CZ2A ACC CTT OCA 480 
Vel Pro Leu Oly Arg Arg Asp Ser Lsu Thr Ale Asn Arg Thr Leu Ale 
25 145 150 1SS 1(0 

AAT CAA AAC CTT CCA OCA CCT TTC TTC AAC CTC ACT CAA CTT AAA OCT 528 
Asn Ola Asn Lsu Pro Ale Pro Phe Vbm Asa Leu Thr Oln Leu Ly* Ale 

ICS 170 175 

30 
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TCC TTT GCT GTT CAA OOT CTC AAC ACC CTT GAT TTA CTT ACA CTC TCA 
Ser Pbe Ala Vel Gin Oly Leu Aen Thr Leu Asp Leu Vel Thr Leu Ser 
100 US 190 

OGT GOT CAT AOC TTT C3UA AGA OCT COO TOC ACT ACA TTC ATA AAC CGA 
Gly Gly Hie Thr Phe Oly Art/ Ale Arg Cye Ser Thr Phe He Aen Arg 

19S 3*0 205 

TTA TAC AAC TTC AGC AAC ACT CGA AAC CCT GAT CCA ACT CTG AAC ACA 
Leu Tyr Aen Phe Ser Aen Thr Oly Aen Pro Aep Pro Thr Leu Aen Thr 

210 215 220 

ACA TAC TTA OAA OTA TTO COT OCA AOA TOC CCC CAO AAT OCA ACT GOO 
Thr Tyr Vm\x Olu Vel Leu Arg Ale Arg Cye Pro Oln Aen Ale Thr Gly 
225 230 23S 240 

GAT AAC CTC ACC AAT TTU GAC CTO AOC ACA CCT GAT <.^A TTT GAC AAC 
Aep Aen Leu Thr Aeu Leu Aep Leu Ser Thr Pro Aep Gin Phe Aep Aen 
245 250 255 

AOA TAC TAC TCC AAT CTT CTO CAO CTC AAT OOC TTA CTT CAO AST OAC 
Arg Tyr Tyr Ser Aen Leu Leu Oln Leu Aen Oly Leu Leu Oln Ser Asp 
2<0 2CS 270 

CAA OAA CTT TTC TCC ACT CCT OOT OCT OAT ACC ATT CCC ATT OTC AAT 
Qln Olu Leu Phe Sex Thr Pro Oly Ala Aep Thr lie Pro lie Vel Aen 
275 200 205 

A*XT TTC AOC AOT AAC CAO AAT ACT TTC TTT TCC AAC TTT AOA OTT TCA 
Sex Phe Ser Ser Aen Oln Aen Ttw Phe Phe Ser Aen Phm Arg Vel Ser 

290 255 300 
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ATO AT* AAA ATO OCT AAT ATT OOA OTO CTO ACT OOO OAT OAA OOA CAA 9(0 
•tec II Lys Mat Oly A*a Zl« Oly Val L*u Thr Oly A#p Olu Oly Olu 
305 310 313 320 

ATT OGC TTO CAA TOT AAT TTT OT3 AAT OOA OAC TOO TTT OUA TTA OCT 1004 
5 11% Arg Uu Ola Cy» Am 9hm Vml An Oly A#p Smr Pbm Gly L«u Ala 

*25 330 335 



WJT OTO OOO TOC AAA OAT OCT AAA CAA AA0 CTT OTT OCT CAA TCT AAA 10S4 
Smx V*l Ala Smr Uym Amp Ala Ly» Ola Ly» tmu Val Ala Ola Ser Lya 

345 350 



10 



15 



20 



25 



T AAA CC AAVA ATIAATOOOO ATOTOCATOC TAOCTA0CAT OTAAAOOCAA ATTATOTTOT 1114 
AAACCTCTTT OCTAGCTAT* 7TUAAAXAAA CGAAAOOAOT AOTOTGCATO TCAATTCQAT 1176 

rrroccATOT AccrcrraaA atattatota aiaattattt qaatctcttt >ao^actva 123s 



ATTAATCA 

1244 



(2) XNTOftKArXO* *0* SBQ tO HO: 2: 



(i) 310UUU CBDUUCTtgu«TZGit 

(A) LSKTnix 4700 but p*lra 
(ft) THfi miolalc acid 

(C) 5TKAWJMI WHS i slngla 

(D) TOMLOGTi IIamut 



30 



(11) nOUCDU TT7S: OKA (gnomic) 
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(lx) FCXTOWt: 

(A) NAHB/XKT: promoter 

(B) LOCATION: 1. .1532 

5 (lx) KATUU: 

(A) MAMS/ KEY: •l*j>«ptid* 

(B) LOCATTOWt 1533. .1409 

(lx) mrcmt: 

10 (A) KAMB/KBY: ttJCOfl 

(B) LOCATION: 153 J. .1751 

(lx) PKATOUH: 

(A) KAMB/KBY t moil 

IS (B) LOGATXOV:23t3. .25/4 

(lx) FBATURB: 

<A) KAMB/KBY: «XOO 

(B) LOCATION: 3405. .3769 

20 

(ix) FBATOtt* 

IA) KAMB/KBY: MOB 

<B) LOCATION: 4033. .4514 

25 (lx) FSATUMi 

- (A) KAMB/KBY t IntKOO 

(3> L0CATIQir:t752. .17S2 

(lx) FSAXUUt 
30 (A) KAMB/KBY; lntxoa 
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(B) LOCATION: 2575. .1604 



(ix) TUItlU: 

(A) KAMS/EEY: introa 
(9) LOCATION: 3770.. 4032 



<ix> FSATUU: 

(A) HAMB/KSY: CGS 

<B) LOCAXIOWi IS33. .1751 



(ix) FSATUU 1 

(A) HAMS/KSY: CDS 

(B) LOCATTOSr 2343. .3574 



(1x1 PEATOftB: 

(A) HAMB/KSY: CDS 

<B) LOCATION: 3 405. .374* 



(ix) FSATOSS: 

(A) KAMB/KSY: CDS 

(B) LOCATION: 403 3. .45X4 



(xi) asuusmj dssgsxptxo** bbq to not a : 



TA flA T AAAAA AATOXZATAT AATTTTTCTC AOAIOrCQTT TATACTQTTT TTTTAATCRO 
AATTAAAATT CCTCTTTAAT tATCQACAIA AXTTTTTTTQ QTOAAIMT* TQZACATAAT 



TATTTAAXAC AAATTTTTAT TOTACATAOA AOTOATACTT CAATTCTAAT ATXQOMAAC 
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AGtACOAAAA GATAAAAAAA CTOTTATTA0 AAGAAAAAAA TATATUGAAA AGOTTAOCTA 240 

CATATATTAG CTAAATTAOT AV1TLTA ATT OOCTATATAA ACCCTATTOT ACTCTTTGTA 300 

ATCTCAOCTT TTTCATTTAA ATACATTTCT ATTTTTTAAO TTC TA TATTT TCTCTTVATT 3«0 

5 

TTCTTCOATA AACGATOAAA TTTAACATOO TAXATGAGO0 ATACCAOCCA CTTTOAAA0C 420 

CATOTATOOC TAOTATOOOC AGCCAAAATT TOCCCPOOTT CAA0CAAAGC AAOTOTTTAT 400 

10 ATAOATOTOA CTTTTOTTOA OOAACTCATO CCAATOOTAC TQATTOTOAA ACTOAOAAAA 540 

CTAATTTQOA OAATTTOAAT TATOATCATT AAATACTCCT CTCCTGACTX CC ri WmJC 600 



IS 



TCAAATTTOT ACCATCATTA TTTCCCAAAA ATTTGATTAC AATGCACTAA TTAATOAATO 660 



TTTCTTACAT TXTCAXATTA TCATATCTGA CATTTTOTTT TTACTTTTTA TAATAATTAT 



720 



TTTAAAAA0T CATACATOCA AAXAATTTTT TAATAOTTTA CAOTTAAATT TTTACACTAA 780 

20 AAATGCATQA AAATTAAACT TTATTTTTCC AA0TGATCAX TTftOTCAAAT CCCAAAACAA 8*0 

TOATTATTTT TTOGAAATOA ATOTTTATTO AACAtTTAAA TOTAOCCTAA TTA A TTCT G O 900 

TTATCOTOTC AATQTTCCAA AACCXAAXQC AASATCmS CAAOT3tCATA CATAGATCTA 960 

25 

ATTTTAAACT TATCTTTA03 CAAflAOATAT AAAflATTAIA CATCTAOTTT tAAACATTAA 1020 

CTmOTTTT TOTOTTAAAA AACAOTAACA TTTTCTTAAT TTTQTAOAOT OAOROCTCC 1080 

30 AACCATATTA AOEXAAOATTT TAATT30TAT TGMKRTCKT OAACTtAOTA AAIAAGTTTT 1140 
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ocrcrroffr tttcaatttt cattacaaca ttcatotaaa atatcaaobt tttctoaaat 1200 

TTUllUCUV TOTOCTCCAA COWCATTTAA OMlUnUKi AAATTAAXTT TCAAOAAOAT 1260 

AATOATTCCT ACTCTTOCTO OCCCTACCAX AOTACAATAA ATCCACTCAT AAATCAACAA 1320 

CTC9T0BTCA TAOOCAATTO OOCATCATAT CATAAACAAT ACCTACOTGA TATTATCrAQ 13«o 

TOTCICTCAO TTTACTTTAT OAaAAATTAT TTTTCTTTAA AAAAACTTAA TTAATAAAAA 1440 

CATTTOCOAI ACCOTOAQTr ACAAOAAATC OOCOOAATTC ATCTCtATAA ATAAAAGOAT 1S00 

CTATATOAOA OOTAAAATCA TATTAACTCX AA ATO OOT TCC ATS OCT CTA TTA 1S5J 

Mat Oly Sar Mat Arg Lau Lau 

3SS 

OTA OTO OCA TTO TTO TOT OCA TTT OCT ATO CAT OCA GOT TTT TCA OTC 1601 
Val Val Ala Lau Lau Cya Ala Pfaa Ala Hat Hi. Al* Oly Pha Sar Val 

375 



"° 368 j 70 



TCT TAT OCT CAO CTT ACT CCT A03 TTC TAC AOA OAA ACA TOT CCA AAT 1«49 
Sac Tyr Ala Ola Lau Thr Pro Thr Pha Tyr Axg olu Thr Cy. p ro Aaa 
3,0 "* 390 



1697 



CTO TTC CCT ATT OTO TTT OOA OTA ATC TTC OAT OCT TCT TTC ACC GAT 
L*u Wbm Pro Ila Val Pha aiy VUUiN»J)tp Al* Sar Pha Thr Aap 
JM 400 40S 



<^cahATc<m<KCHncKKtaHncrtaamaa<uaTC!CTTT 1743 

Pro Arj U. Oly AU iwlau mi aij Uu Ml. Pha Hi a Aap Cy. Pha 

4X3 420 
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OTT CAA GTAOGTACTT Villi 11 HC TTCCAAAATO CCCTCCATAT TEAACAAGAT 18G1 
Val Oln 
425 

TOCTTTOTTC ACCTAOAAAA ATOTOTTTTT TTCAACOATC TTACOTACOT ITO TI TCUTt 1861 

5 

TOAAAAATAA ATCAOAAAQA OATCAAOAAA ATAOCTAOAA AOAAAGCAAC GTTTTTTTAA 1921 

AA0GTATTTA CTCTOAOAAA AATATTAAAA CTGAAGAOAA AOAAATTAAA TAAGCTTTTC 1981 

10 TtOAATOATA TTTACATOTC TTATTAACTT AAASTCACCT UlllllUA AfflTOTGCTT 2041 

QAAflAAAAAA OATOTCTJTC A0TTTA0TTT TOATTAATOC TAATTATATT TTTAATTAAT 2101 

TAATTAATAC TATATATCTA TTTACCATAT TAATTATTAC TATATTTCAT OATOACAACA 2161 

15 

OACAAOTATT CTAAAGACGT ATCGGTAOAT QATTAATTTT TTTATAAAAA AATCTTTTGC 2221 

OTOTATAGAT ATTCTTTTAT AATTOOTOCA OAAACTTOTA ATQCTAAXTO CAATTAATCT 2251 

20 TACATTOATT AACTAATAOC TATAATCAAT ATTTAOOTTA OOTATAOOAO ACAAATCAAO 2341 

TOATCTOAAC AAATTAAOTT OTTATATTTO CATTOTOACA O OOT TOT OAT OOA 2394 

31y Cya Asp Gly 
1 

25 

TCA OR rtO CTG AAC AAC ACT OAT AC* AXA GAA A0C OA0 CAA OAT OCA 2442 
Sar Val Lau Lau Aan Aan Thr Aap Thr 21a Olu Sar Olu Oln Aap Ala 

S 10 IS 20 

30 CTT CCA AAT AXC AAC TCA ATA AOA OOA TTO OAC OTT 0TC AAT OAC ATC 2490 
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Leu Pro Am Zle Am Ser tie Aro Oly Leu Am Val Val Aaa Asp Zla 

as jo 35 

AAO ACA GOO OTQ GAA AAT AQT TOT CCA OAC ACA CTT TCT TGT OCT OAT 2538 
Lya Thr Ala Val Olu Am Ser Cya Pre Am Thr Val Sar Cya Ala Am 
40 « 50 



10 



J5 



20 



25 



30 



2584 



ATT CTT OCT ATT OCA OCT OAA ATA OCT TCT QTT CTO OTAATTAATA 
He Leu Ala ll« Ala Ala Olu Zla Ala Sar Val Leu 
55 CO 



ACTCCTAATT AATTCOZAAC CATTAAAAAO TTGCATQATT OOATTCAAAA UllAlU/rA 2«44 
TTGOOOTTCT QATATAAATT TOTAATTAAA TTGCACTAAA AAAAATTATC ATATACTTTT 2704 

AATAAAAAAA ATTTATCTAA TTTAATTTAT TATTAAAACT ATTTTTAAAA TTCAATCCTA 27S4 

ACTCTTTTTT AATCOOAOCA TOTAAOCTOO CACCCACCOT ATATCBTTOO AAOATOCTAT 2824 

AAAACCATTT AATTAATOOA TOQAATOOT CAAAACATTT AATTCAAAAT ACTCTTAATT 2884 

OTOATtACTA ATCATOTTCO OGCAAQTTAC OTTOTOTATA ATTAATTTOA CTTAATCAOA 2944 

TAAAAAAACA AATGOACOCA AOCO30TT!» TAXAOATATC ACTOOCCTOT AOAATATOTO 3004 

OTTTTTCACO TTTAAATAAA AOCtAOCtAC ZMXPTAIXI TTAOTCTTTT TTTTTCTTAA 3064 

ACCCATTTAA CQTOATTTAT TaACTOTOftA ACATOTTTCC ACACACAOOC TTAOAAACTC 312* 

CTCOCAACtA AC&TCTOCAA AATTTOACTA TTTATTTATO AAOATAATTC ATCTATOATO 3184 
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ttcaactcta ttatataxat oTATCATcac actattaaga attataataq tcaaatatao 3244 

AAOTATATCQ OOTAAATOTA OTTOCATOTO COJWXTCTTT CGTOTAAAAT OCTTATTCTA 3304 

TATXGCTTTT TTTATTOGAA AATAACOATO AACEAAAAAC OAAAOGQTAT CATATADTTT 3364 

5 

QfcCrmXTO TTAOAGAOAO ACATCTTAAT TWOTCATAT OTTAAATAAT TAATTACAAT 3424 

OCA1ACACAA ATATTTAXOC CATATCTAAA AAATOATAAA ATATCATAOO TAXACTCAAC 3484 

10 TATATOATAT CCCCATAACA OAAATTOTAC n W C TTCAO OCAATOAACT TAACATTTCT 3544 

OTPTOCTAAA AACAAACATC CACTTAAAOT OQTTCAACAT ATTTATOTAA TAATTTACAO 3604 

«3AO<lAG<rrCCAGOATOOC<AOTlCeA«AaOAJ^AM 3652 
15 Oly Oly Oly Pro Oly Trp Pro V«l Pro Uu Oly »rg Aig i*p Ser Leu 

! 5 " 1S 



3700 



ACA OCA AAC COA ACC CTT OCA AAT CAA AAC CTT CCA OCA. CCT TTC TTC 
The Ale A>a Arg Thr Leu Ale Kmn Ola Ami Leu Pro Ala Pro Phe Phe 
20 20 25 30 



AAC CTC ACT CAA CTT AAA OCT TOC TTT OCT OTT CAA OOT CTC AAC ACC 374B 
Asa Leu Thr Oln Leu Lye Ale 3er Phe Ale Vel Ola Oly Leu Aen Thr 
35 40 4S 

25 

CTT GAT TTA OTT ACA CTC TCA OOTATACAXA ATCAATTTTT TATTTOCtAT 3799 
Leu Amp Leu Vel Tbr Leu Ser 

so s* 

30 TACCTAGCAA TAAAAAOTCT CTQATACAOA CATATTTAOA TAAATTAATT TCTCCATAAft 3859 
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20 
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CATTTATAAT AAAATTATCA ATTTATOTAC TTAAAAATTA TCGATTCAAQ CTCTTTTCAT 3919 



CCAACTTTTA CTAAAOTTAA OOTOCATATA ATATAAAATA AACTATCTCT TOT 



TTCTTAT 3979 

AAAAAGATTO AAOATAAOTT AAACTCTACT TATAAATCAT TAATATATOT ATA COT 4035 

Oly 

1 



OCT CAT ACQ TTT GGA AGA OCT 000 TOC AOT ACA TTC ATA AAC OQA TTA 
Gly His Thr Phe Gly Axg Ala Axg Cye to Thr Phe lie Am Arg Lau 

10 S « 0 is 

TAC AAC TTC ACC AAC ACT OQA AAC OCT GAT CCA ACT CTO AAC ACA ACA 
Tyr Abti Phe Ser Aan Thr oly A*n Pro Aap Pro Thr Leu Aan Thr Thr 
20 « 30 



4083 



4131 



TAC TTA GAA GTA TT0 OCT GCA AQA TOC CCC CAQ AAT OCA ACT OOO OAT 4179 
Tyr Leu Glu Val Leu Arg Alt Arg Cyi Pro Cln Asa Ale Thr Oly Aap 
35 40 45 

AAC CTC ACC AAT TTO QAC CTO AOC AOk OCT OAT CAA TTT GAC AAC AQA 4227 
Ain Uu Thr Am Uu Aip Uu 8«r Thr Pro Cln Phe A*p A* a Arg 

50 55 <o ss 



TAC TAC TCC AAT CTT CTO CM CTC AAT OOC TTA CTT CAO ACT GAC CAA 427S 
25 Tyr Tyr 9«r A« Uu Uu OU tw An Oly UqUuGU Ser A*p Oln 

7 ° 75 t0 

GAA CTT TTC TOC ACT OCT GOT OCT GAT ACC ATT COC ATT 0TC AAT AGC 4323 
Olu Leu P!>e Ser Thr Pro Oly Alt Aip Thr lie Pro lie Vel Am Ser 

30 as S0 „ 
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TTC AOC AflT AAC CAO AAT ACT TTC TTT TCC AAC TTT ASA QTT TCA ATS 4371 
Phe Ser Ser Km 01 n Asn Ttvr Phe Phe Ser Ann Pbe Arg Val Ser Met 

100 105 HO 

ATA AAA ATO QOTT AAT ATT OOA OTO CtO ACT OCX* OAT OAA OOA OAA ATT 4419 
5 lie Lye Wet Oly Aen lie Oly Val Leu Thr Oly A»p Olu OXy Olu lie 

115 120 125 

OOC TTO CAA TOT AAT TTT OTO AAT OOA OAC TCO TTT OOA TTA OCT AOT 4467 
Arg Leu Oln Cye Aan Phe Val Aan Oly Amp Ser Phe Oly Leu Ala Ser 

10 130 135 140 14S 

OTO OCO TCC AAA GAT OCT AAA CAA AM CTT OTT OCT CAA TCT AAA TAA 4515 
Val Ala Ser Lye Asp Ala Lye Oln Lye Leu Val Ala Oln Ser Lys * 

150 155 160 

15 

ACCAATAATT AATOOOOATO TOCAT0CTA0 CTAOCATOTTA AAOOCAAATT AOOTTOTAAA 4 575 
CCTCTTTOCT AOCTATATTO AAATAAACCA AM1QM9TAJ3T OTOCATOTCA ATTCGATTTT 4635 
20 OCCATOTXCC TCTTWATA TTATOTAATA ATTATTTOAA TCTCTTTAAO OTACTTAATT 469S 



AATCA 4700 
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JSLSS??!?^!^.^ 1 ™ INV1DmON W AN EXCLUSIVE 

ritOPERTY OF PRIVILEGE IS CLAIMED ARE DEFINED AS rOLLOWS: 

I. An isolated DNA molecule comprising the nucleotide sequence of SEQ ID 

NO:l. 



2. 



An isolated DNA molecule cnmp rith Mj at least 24 contiguous nucleotides 
•elected from rarlrotidfs 1-1532 of SEQ ID NO: 2 



3. The isolated DNA molecule comprising a nucleotide sequence substantially 
homolosous to nur l rori d et 1533-4700 of SEQ ID NO:2. 

4. The isolated DNA molecule of claim 3 comprising a nucleotide sequence 
substantially homologous to met of nucleotides 1-4700 of SEQ ID NO:2. 

5. The isolated DNA molecule of claim 3 comprising nucleotides 1333-4700 of 
SEQ ID NO:2. 

6. The isolated DNA molecule of claim 4 coo^risins the nucleotide r-cquence of 
SEQ ID NO:2. 

7- The isolated DNA molecule of claim 2 coumrismg a nucleotide sequence 
substantially homologous to mat of 1-1532 of SEQ ID NO:2. 

8- The isolated DNA motacuie of claim 7, conmrismg the nucleotide sequence of 
nucleotides 1-1532 of SEQ ID NO:2. 



9. 



An isolated DNA moiecule of claim 3 comprising at least 32 contiguous 
mckotides selected ftom rwrtonrtrto 412-1041 of SEQ ID NO 2. 



: <CA 221101 aA1> 
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10. An isolated DNA molecule of claim 9 comprising the nucleotide sequence of 
412-1041 of SEQ ID NO:2. 

11. An isolated DNA molecule of claim 3 comprising at least 23 contiguous 
nucleotides selected from nucleotides 1234-2263 of SEQ ID NO:2. 

12. An isolated DNA molecule of claim 11 comprising the nucleotide sequence of 
1234-2263 of SEQ ID NO:2. 

13. An isolated DNA molecule of claim 3 comprising at least 22 contiguous 
nucleotides selected from nucleotides 2430-2691 of SEQ ID NO:2. 

14. An isolated DNA molecule of claim 13 comprising the nucleotide sequence of 
2430-2691 of SEQ ID NO:2, 

15. A vector which comprises the DNA molecule of claim 1. 

16. A vector which comprises the DNA molecule of claim 2. 

17. A vector which comprises the DNA molecule of claim 3. 

18. Tte vector of claim 16 which comprises a heterologous gene of interest racier 
control of the DNA molecule. 

19. A tost ctJI capabte o£ expressing the DN within the vector of claim 
15. 

20. A host cell capable of ex pres sing the DNA molecule within the vector of c^Lim 
16. 
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21. A host cefl capable of expressing the DNA molecule within the vector of claim 
17. 



22. A host cefl capabfc of expressing tbe DNA molecule within the vector of claim 
18. 

23. A transgenic plant comprising the vector of claim 15. 

24. A transgenic plant comprising the vector of claim 16. 

25. A transgenic plant comprising the vector of claim 17. 

26. A transgenic plant comprising the vector of claim 18. 

27. A method for the production of soybean seed coat peroxidase in a host cell 
comprising: 

i) transforming the host cell with a vector comprising an isolated DNA 
molecule selected from the group consisting of SEQ ID NO. l. and SSQ ID 
NO:2; and 

ii) cuhuring the host cell under conditions to allow expression of the soybean 
seed coat peroxidase. 

28. Aprocessfwprodiictagaheicw^ 

a transformed plant with the vector of claim 16. 

29. Tbe process of claim 28 wherein the heterologous gene of interest is produced 
within seed coat cells. 
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FIGURE 1 

AT30GTTCCAT0CGTCTATT 20 
H g 3 H R li L 

prx9+ > 

AGTAGTOGCAriXiTTOTQTG^^ 80 

vva i .t. gA»AMgAQFSVSYA Q 1 
signal aaquenca 

GCTTACTCCTACQTTCTJUCAQAQAAA^ "° 

LTPTFYRSTCPHI-FPIVFGV 21 

prx!2* > 

AATCTTCOATOCTrCTTrCACara 200 
IFDASFTDPRI a A g L H A — L — H — E 41 

active alte 

I < 

TCATOATTGCTTTGTTCAAG GTTCnWATQQATCALJI 'l'l' 1 T GAACAACACTQATACAAT 260 
H D C PVQ OC OOSVI.I.lfK TDTI 61 

--prxlO- - prx2+ > 

AGAAAOCaAOCAAGATGOtfrTTCCAA 320 
BSBQDAI*PlTI»SIRat»DVVN 81 

TGACATCJUUSACAGCGGTQGAAAATAOTT3TCCM 3 30 

D I KTAVBMSCPDTVS CAD I L 101 

II 

TOCTATTOCAGCTGAAATAOCITCtafCTCPO GGAOQAGCTCGAOQATQOCCAGTTCCATT 44 0 
AIAABXA3VL Q GOPGWPVPl, 121 

AGGAAGAAGGOACAGCTTAACAGCAAACCGAACCCTTt^^ 500 
OR RD S L.TAMRTLANQ.TL PA P 141 

TrrC"ritIAACCrCACTCAACTTAAAUCri'CC 1 1 iUC 1U r I C AAGC3TCTCAACACCCTTQA 560 
PPMLTCLCA S PAV QOLMTLD 161 

III 

TTTAGTTACACTCTCAG QTOGTCATA LWl 1XJU AAQAGCTCQCTTQCACTACATTCATAAA 620 
r, V T I, S O Q g T F ORARCSTFIM 181 
harae - binding domain 



COOATTATACAACTTCAOCAACA C TPgRAAOCCTQ^ «»0 

RLYHF6WT0MPDP TLHTTYX- 201 

AOAAOTATTOCOTOCAAOATOCCOCCAflAA^ 740 

BVLAARCFQHAT3DHLTHLD 221 

CCTqAQCACACCTOATCAATTTt3A^ 300 

L3TPDQFDJtRYYS»T. L QLHO 241 

CTTACTTCAGAOTOACCAAQAA Crri ' ll. iCCACTCCTCJQTGClUATACCATTCCCATTOT 860 
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ftQATAAA 

V 8 FS3VQMTPF9]f7BV9MXK 

AATOCOTAATATTOQAgTCXTTaACT l ' IOJCrit3 CAATOTAXTTT 
MGVXOVLTODSQSX&LQCVF 

VNQD8FOLA«VASXDAXOXL 



TOTTGCTCAA1 

V A Q 3 K ♦ 



KKX^MCTAOCATCRAA 



TOCATtm!XXTTT»ArrrTtKXATOTACCI 
CTCTTTAAOOTACrr AATTAATC (A) a 



361 



920 
361 

980 

301 

1040 
331 

1100 
336 

1160 

1330 
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10 Z'J 30 4C 50 *0 

ii : 

l "X2ATCATATC A T (UUkCAA rACGTACCTGATATTATCTA^TCTrCTXTr AG* nTACTTT ATG 

6 1 ACA.\ATT A TTTT r TTTT AAAAA A A .7TT AATT AAT AAAAACA" TTOC JAT >.C COTO AOTT A 

12 1 CA \a.\X^TCCGCCGAAlTCATCTCrAT AAATAAAA^iATCCAT ATG> OAGCTA^ AAT JAT 

IB" A ~*T AA i*CAAAATOO*7TTC TATOOTT CT AT r AXJTAOTOG 1A TTOTTT TTTGCA TTTOCT A 

2 4 1 r.r^ ^ - X7l"rrriXlAGTCTCTTATOCTCAGCTTAC^ 

1 1 1 ;rccAAATCTv:TcccTA7Tcm> rccxc 

i 6 i cy^TCtxxxxrcACTcrcAiu^ r 

4il - T^T-rnr-^CTTCCAJUUVTaCCCiXXl^TAI^ 

481 Arrr-TT-LTTTT 1XACC ATCTf ^ Ct7TACGTTTOTTTO<TrTTt3AA/ ^ AT AAJ*TCAOA^AOA 
!>4 1 nAT^XAC AAAATX3CTAGAAAGAA-\CC AACGTTTT'*TrAAA^ SOTMTTACTGTGAGAVA 
oOl AAT A 7T AAAACTGAAGAAaAAAG AAATTAAA 1 AAOCTTTTCTTC3 AATOA TAT*i T ACATO TC 
«<!1 TT ATT> ACTTAAAi^- .XJ^C*!*rTTTTCTTTAAC - t"rC3T 3CTTGAA£^JV/AAA£ ATGTCTTTC 
7 2 1 ACT ~ ACTXITGAT^AAttJCTAAT^^^ CTA rATATCTA 

7 91 TTT A ^ " A T Ik T AA rrATTA^ATATTTCAT\^TGAC*ACA^\CAAC-TATT ~r AAAGA3CT 

8 4 1 ATC;VTA.U^TTA>T I' lVl I I ATAAAAAAATCTT1 TGOGTU 1 XTAGA r AH CTTTrAT 

9 0 l A .r. GO TGCAGAAA ^TTnTAATOCTAATTS-'^AATTA A 7CTT ACATT' 1 A n VvCTAATA^ 
9% 1 TATAATCAATJTTTJUWTTJUOTVrJ^ 

lOei Ar.XG\AV^J^CA.\OATGCACrrC 'AAATATC V.CT^XfAA^WOATT^^AtGlTCr'J 
114 1 AATGACATCAA -.ACACCC0T0GAAAAT^TTtrrCCA*3ACAC^ r7CT3rTGATATT 

: 2 ii crTjCTA r; ^v^ct^aatagct;^^^ rrccTAj* ttaattccc 

12€ 1 AA . JATT-AAAAJ^r:OCATGA^TOOArrCAAAATTC?AT%l ~;GTT TrO^TATAA 

nn A~TT^A..TT^ATTCCACTAAAAAAA.\rTATC^TAT^^ "ITTA iTAAAAAAAATTTATC 
. J 8 1 TAA~TT AATTTA-rrATTAAAACTAm TT AAAATTCAATCCTAA.TCTT-: TTTA VCGUA 
144 1 GCATJT\ASCTGG^CCCACCCrrAT*TCOTT^AAaAVn^rATA^*AA -t'ATTI AATTAAT 
1 E 0 1 < Ki A TOO A_\ TtlAUT CAAAACATTT AATT CAAAAT A CT OTT AATTG~* C A TT A G T AAT r XTOT 

1 b« 1 TC'-OC^A VTTTACCTrrOTCTATAArTAATT.'aACTTAATCAUA r/J\AAAAACAAATCjGAC 
162 1 0~AAJC-: jCTTGG^ATAi3ATArC^^ 

16 9 1 AAAA1C rAjGCTACTATArTATATTTAGTv/.'L'l 1". I rTTJTT VAA^CCA 7TV AACOTOA i . 
1741 T a'TJAi nXTTGAAA OAT ' " ^CA^CArAOGC^TA'lAA> r. CCT rOCAACTAACATfT-r 
1801 CAAA£ I .TUA JTATTT /CSAAflATAATTCATCTATViA itST'TC AAv/T JTATTATATA 

18G I r A TvTTAT CaTCGCA 31 . . AGAATT ATAA T.AG ?CAAATATA*JAA*JTAT VTCOGGT A P *AT 
1921 ^AjSTTGCaTCTCXXWC^TOTTTCCT^ 

1 9 P 1 OAAAAT AACGA r^AACTAAAAACCAAAOCXiTATCATAT AOTTTOAC: T LTT A TJTT AGATIA 

2 0< 1 CAOXCATCTTA^TTTC<rrt^TATCTTAAATA^ 

2 101 TOv'CATATCTA AAAJU^TOATJ^KAAT ATCAT AOOT AT ACTCAACT AT A TQA TAT CC* CAT A 
7X61 A CACAAA TTG T KJ ITl'T C rTCAQGCAATGAAXTrTAA CJ- "TTCTTTTTO<""T AAAAACAAA C 
2 221 ATCCACTTAAAarXXJTrCAACATAnTATOTAATA^ 

2^81 GC :^TXXraTTMGJUMaU^ A 
2 341 CCTT^CJUXACCTTTtnTO^^ 

•^Cl GAACACxTCTTOATTTJiaTn^^ XTTrGCTATTA 

2*61 eyrTACCAATAAAAArfmTTCrGftTAC^ 

2S-il 7TTATAATWUUkrrAT*3UkT^ 

2 5 9 1 AACTTTTACTAA-VrTTJUkacnOa^ 

264. AAAuATTGAAylATAAGTTAAAGTCTAc^^ 

27 01 ACCJTTTaaAAOX'XrrCGGTOCJU^ 

2 761 (KUAAC CCTtxATOCAACTCTGAACACAACAl ACTT ^OAACT ATTGQTTT^A AGA 1 3CC CC 
2 821 CAJC^TOCAAirrCTOOATJUk^ 

2 8 31 AACAG \TACT^CTJCAAXrTTCTC^<K7r CA^T30CT 

2 941 TTCTCCACT CCTC^TGCTQATACCATTCCC\1TT3TCAA rAGCTTCAGCAGTAACrLA0AAT 
2 0 01 ACTTTCTVTT T TAACTTTA3AOTrTCAATT^TAAAAATX>JG T AAT Al^'^JWTCCTGAC T 
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i- -> ± & — ATGOcrrrecArx^rr - cvattactagtcw^tvgtto 3 c 

U41bfT - 0 

X9069 3 O GCAAA - CAATC^ACTCCCTTOGTOCTGIAGCAATAG - CTTTGTGC 14 

X9C694 GCTCTTCAAAaT!AATQ>*ACTCC- - TTAGCAACTT- CTATCTGO 40 

LJ6156 - CTCC TTAOCAACTr - CTATCTGO 22 

X9C692 AATGCTTGO; CTAAaiTGCAACAG C I ' l l" I 1 GCTGTATGG 18 

L7 316 3 TGT GCATTT - OCT ATOCATOCAG G IT 1TTCAGT CTCTTATGC 7 7 

U41657 - _. o 

X y O s* 9 3 TGTATTvTO G l^TGTGCTTGGAGGGTTAC C CTT CT CI TCAAATGC 8 8 

XS.o94 TG TG VTCTI GCTTTTAG ritn^jri'GGAGGACT AC C CTTTTC CT CAGA TG C 90 

r ,3 6 1 5 6 TGTGTTGTGCTTTTAGTIXrrGCTrGG CAGATGC 7 2 

X90692 *TT - TTJTGCTJ * T TOOAGUAOTACCC rTTT - - - OAAATGC 75 

L 7 8 1 6 3 Tt^GCTTACTCCT ACGTTCTA CAGAGAAACATGTCCAAA TCTG TTCCCTA J 2 7 

'J 1 1 * 5 ' 0 

X 9 0 6 9 3 GCAACTTGATCrATCCTTTTACAGGAAC^CTTGTCCAAA FGTTAGTVC CA 13 8 

X 9 0 6 9 V AC AACTT VGTCCCACrrin IVACAGCAAAACGTGTCCAACTGTTAGTTCCA 1*0 

l3 € 1 5 S ACAACTTAGTCCCACTTTT7ACAGC TTCCA 122 

X 9 0 6 9 2 ACAACrAOATCCTI CATTT T^CAACAGTACATGTTCTAATCTTGATTCAA 12 5 

L7 9 1 S 3 TTGTGl TTGGAGTAATCTTCGATO C m ' i 1 ' IC ACCGATC C C CGAATCGGG 17 7 

U416S7 0 

X 9 6 9 3 TlXSrTCaTGAAGTCATAAGGAGTGTTT CT AAOAAAGATCCTCGTATGCTT 19 8 

X9C6 94 TTGTTAGCAATGTCTTAACAAACGTTTCTAAIjACAaATCCTro 190 

L3 5 ^56 TTGTT ^GCAATGTCTTAACAAACGTTTCT AAG ACAG AT C CT CGCATGCTT 1 7 2 

X90692 TCGTACGTGGTOTGCTCACAAATGTTTCAC^ 17 5 

L 7 b 1 C 3 OC CAG rCT CATGAGGCTTCAriTTCATGATVGC TTTCTTXLrtAGG TTGTGA 2 2 7 

u * 16 57 - TTTCixTtlATTGCri\XTT*rCAAGGTTQTGA 2 9 

X 90 6 93 OCTAGTXTTTGTCAGGCTTCACTTrCATGACT Xj 1 I I " 1 U ' rr CAAGGTTGrTGA 2 J( J 

X90 6 94 GCT AGTCT03 TCAGGCTPCACTTT CATGACTOTn T3TTCTOGGATOTGA 240 

L3 5 1 S6 GCTAOTCTCUTCAGGCTPCACTrTeATOA C lmTl ' lXylT CTGGGATGTGA 2 22 

X90692 GOTAOTCTCA?CAi3CKrrACATTTTCAT ;2S 



L 7 8 1 ^ 3 TOOAT CAl/ l'lTI OCTr^AACAAGAfTTGAT ACAATAQ^AAGC GAG C AAGA TO 2 7 7 

1 S 5 7 TGOATCACTTTTACTG AACAACACTGAT ACAATAGA AAQC GAGCAAGAT G 7 9 

X 306 93 TQCATCACJT TTT ACTAAACAAAACTGATAC CGTTGTGAG TGAACAAGATG 2 88 

X 9 06 94 TGCCt CA ^JT 1 ri til T G AACAA T ftCTQCTACAATCQTAAQCO AACAACAAQ 2 90 

L 16 156 TGCCTCAJ 1 1 1\WIUAACAATACTGCTAC^JWTCGTAAGCGAAC\ACAAG 2 72 

X 9 06 92 TGCCTCGA1 i riUOItiAACOATACGGCTACJLATAGTGAGCG^ 2 75 



L7 9 1 6 3 CACrrCCAAATArCAACTCAATAACAiJGAT^ 3i7 

04 1 6 5 ? CACTTCCAAATATtlAAjCTCAATAAGAOGATTQGAOT 129 

X 3 C 6 9 3 CrnTCCAAACAOAAACTCATTAAC* GGTTPOOATGTTGTaA ATCAAAT C 333 

X 9 0 5 94 C'rTTTCC\AATAACAACTCTCTXAGAGGTl*?QGA rGTTGTCAA T CAGATC 3 4 C 

^36^56 CTTTTCCAAATAACAA C r C T ClX *G<XXrrTrOQATOTI GTOA \7 ZAGAT C 3 2 2 

X*06 92 CACC\CCAAATAACAACTT!CATAAjGA3Gl*rrGGATGT^ 3 25 



L 7 9 1 6 3 AACIACACCGGTOGAAAATA^TTCTrC CAflACACAq l 1 VI LTH ri txm iAT^P 
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U4 1 6 5 7 \AGACAGCGCTC<IAAAATAGT03TC 

XM G6 9 3 AAAACJUSCTGTGGAAAAGGCTTGTCC^^ 

XV06 94 A^CTGCXTrGTAXaAAGTGCCT TGTCCTAATACAGTTrCTTGTGCTGATAT 

6 I S n AAAACTGCT GTAGAAAJCTrGCTTGTCCTAACACA^TTTC^ J TGCTG AT AT 

X 0 0 6 9 2 AAAACAGCG 3TGGAAAATGCTTCTCCTAACACAG 1 l , l l L"ri X;VGCTG^TAT 



L 7 a I C 3 TCITG CT A1TGCAC»CT<JAAAT AGCTTC'l G T" f - CTGGGAGGAGOT t CAGsiA 1X6 

U4 1 b 5 7 7CTTGCT ATTOCAGCTOAAATA^CTTC^rG ^i"GC*l"GGGAOGA J<J'i C - \GGA 2 2 8 

X 9 0 5 9 J TCTTGCTCrrTCTCCTlAATTATCATCl ACA - CTGGCAGATO.nx C fGAC 4 3 7 

X3C6 94 TCTTOCACTTOCTtKTrCAAGCA'^CTCTCTT - CTGGCACAAGCTCrTA^T 4 39 

L3 * TCTTCCACTTOCT- - -CAJU3CA>TCCTTTGTT-CTGGCAC^^jCKrTCCTAGT 4 1 B 

X 9 C 6 9 2 TCTTGCTCTTTCTUCTGAAATATC ATCTGAT - CT GG CAAA T GC r* C CT A CT -* 2 4 



r * • « • 



I.7t* 1 C 3 TGGCCA/rTTCCATTAtfUAAa\AO^AC^^^ *ACCOAA ?C CT 4 7 < 

o 4 J. 6 5 7 TGGCCAGTTCr>TTA^OAAGAACXJGACAGCTTAAC AGCAAACCGAAC C CT 2 7a 

X 906 9 3 TOnAAOOTTC CTTT AOOAAGLAATlAQATtX'; TTT A ^ C GOCAXA C C£ r; ■ f~ ACT le 7 

x * 06 9 * TGOA CGGTT C C -r TTAfiGAAGAAflGG ATGGTTT AAC C GCAA ACCGAA CA C T 48 9 

LI 6 1€6 TGGACGGTTCC ITTAGGAAGAAGGGATGG rTTAACCGCAAACCC AACACT 46 8 

X 9 C 6 9 2 TGGCA AGTiTCCATTAGGAAGAAjGGG Vi AGTT r I"GACAOCAA\' r .\ATTCCC'T 4 74 

»***••• , • * * . * * — • • • • ... • • 

L78 163 T<X1AAATCAAAACCTTCCAGCACC rTTCTTCAA - - CCTCA - CTCAACT TA 52 3 

^ 4 x 6 s " TGCAAATCAAAACCTTCCAGCACCTTTCTTC AA - - CCTCA CTCAACTTA 3 2 5 

X 9 0 6 j TGCI'AATXIAAAATCTTCCAGCTCC - - - rTTCAATAC TACTGATCAACTTA 5 3 4 ' 

X9059 * TOCAAA1VAAA\TCTTCCOGCTCC - - -ATTCAAT^CC^-C^^ 536 * 

L36 1 56 TGCAAA i CAAAATCTTCCGGCTCC - - - ATTCAATTCC1TGGATCACCTTA 515 

X90^92 3CAOCTCAAAATCTTCCT0CCCCCACTTTC1AA - - CCTTA - CTCGACTAA 52 i" 

♦**..**•******♦*.*•** * » # *♦* 

L78163 AACCTTCCrrrO-CTOTTCAAOOTCiCA^ 572 

U4 1 6 5 7 AAGCTTCCTTTO - CTtyiTCAAOOTCTCAACACCCTTGATTTAGTTACACT 3 7 4 

x 3 06 9 3 AAGCTGCATTTO - CTGCTCAA«aOTCTCt3AT ACTACTOATCTOaTTGCACT 58 3 

X9069 * AACCTGCATTT'ACroCTCJUtSGC^ S85 

L361S6 AA-CIX3CATrTtSACTGCTCAAGOCCTCATT ^CTCXHEXSTTCTAiTTTCCCCT 554 

X90692 AATCTAACTITGA-TAATauUUUXTCAxrr^^ S n 0 



***** • i 



L79163 ^TC^ GgTOGTCATAOG^^ 62 2 

X90693 cTo:xxmxrrcATACATTro^^ 63J 

x 9 0 6 9 4 CTO^TCKTrCATACATTW»AAa^OCT^ AGTC 63S 

"6156 CTCGGGTGCTCATACAT1TGGAA£IAGCTC^ AGTC 614 

x 90692 CTCAOOTOGCCATACAATTOQAAGA^XJTCAAT 3CAGATTTTTCG" TGATC 62 C 



L7 8 1 6 3 GATTATAX1AACITCAGCAACACTQGAAACC CTGATXTCAACTCTGA *VCACA 
C41*S7 OATTATACAACrTCAGCJUrACTOGA rTOATCCA - CT - YGGACACA 46i 



6 /2 
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X 9 0 6 9 3 GATTCTACAACTTCJYGCCX^ 6 8 3 

X90694 GATTOTACAACTTCAGCAflTACTOQAAGTCCCOAr^ €85 

L 3 6 1 5 6 GATIXJTACAACTTCAGCACTACTGGAAGTCC CGATCCAACTCTTAACACA 664 

X 9 0 6 9 2 OATTAT ACAATTTCAGCAAjCACTGOAAA CCC COLATTCAACTC TTAACACC* 6 70 

L 7 8 1 6 3 ACATACTTAGAAGTATTGCQHCGCAAGATGCCC C CA3AATGCAACTGGGGA 722 

U4 1 € 5 7 ACAT ACTTAGAAOTATTOCOTOCAAQATOCCC C CAIlAATGCAACTTOGGCiA 518 

X90693 ACTTACTTACAAaUiTTOCTOUJUiTATGTC 7 3 3 

X906 94 ACTTACTTACAACAACTGCCX2ACAATATCTCCGAATO 735 

L 3 6 1 5 6 ACTTACTTACAACAACTOCCIOICAATATC CTGGCAC 714 

X 9 0 6 9 2 ACCTATTTACAAACATTOCAACCAATATO^ 2 0 



L78163 TAACCTCACCAATTItiaJUXOTAOCACACCTOAT 7 72 

U41657 TAACCTCACCAATTTOTACCTT3AGCACACCT 56 8 

X 9 0 6 9 3 OAACCTITACXXfcTTTCaATCCAACQACTCCTaATA^ 7 83 

X 9 0 6 9 4 AAAC CTT ACCA A 1 1 l^2 ATCCAAJOGHCTCCTi^TX^TTT<Z^CAMZA£JZT 78 5 

t, 3 6 1 5 6 AAACCrrACCAATTTCOATCCAACOACTCCT 764 

X 9 0 6 9 2 AAACCTAACCGATTrOOACCCAACCACACa^ ACT 77 0 

L 7 8 1 6 3 ACTACTC CAAIXTTTCTCKIAOCTCAATOG CTTACTTCAGAGTGAC CAAGAA 8 2 2 

U4 16 5 7 ACTACTC CAATCTTCTCCAOCTCAATGGCTT ACTT CAGAGTGAC CAAGAA 6 1 e 

X 9 C 6 9 3 *TTACTCTAArCrrCAAGTOAAAAAAGGTTT^ 83 3 

X 9 0 6 9 4 ATTACTCCAATCTTCAAOTOAAAAA3GGTITCCT 8 3 5 

L361S6 ATTACTCCAATCTTCAAOTQAAAAAGGGTTTGCTC CAAAGTOATCAAGAG 814 

X 9 0 6 9 2 ACTACTCCAATCTCCAAOrfQQAAAOOQ C A ' IU 1 T ' l CAQAQTQACCAAQAQ 820 



L7 8 16 3 L-rinTCTCCAC"fCCT»rroCTqATACCATTCCC^ 8 72 

U41657 C ^ y rTTCTCCACTC C TtXITQCT g ATACCATTCC - ATTGTCAATAGCTTCAG 66" 

X 9 0 6 9 3 TT G TTCTCAACAT w rQQ TTC J UlATACCATTA 493 

X 9 0 6 94 riW TCraUOT f CTOCW O CM^T^^ 8 8? 

L36156 ITGTT C T C AA C r r CT Q CW q aUlATACCATTAGC^ 86 4 

X 9 0 & 9 2 C M innnnTCCAGAAA'A WITCIXiACACTA I i II 1 ATTOTCAATAGTTTCGC 8 70 

L78163 CAGT AACCAQA AT A C 1 1 ' lt, 1T1 1 CCAACTTTAGAGTTTCAATQATAAAAA * 2 2 

U4 1 6 5 7 CO- -AACCAGAATACTTTCTTTTC 3U MJTT I ' A Qfr UlT I ' CA ATOATAAAAA 71? 

X90691 AACCaArCAAAAAOC^rrTCTTOAflAOCTTTAGGGCTGCTATQATCA 933 

X 9 C 6 9 4 CACCGATCAAAATO Ca 1 HI 1 1U AGAOCTTIAAQQCTCXAATQATTAAAA 935 

L36156 CACCOATCAAAATG'- l'lTC'l I'lUAOROCTTTAAGGCTOCAATGATTAAAA 914 

X90692 CAATAATCAAA CiC 1LX lClTlU AAAATTTItyrAGCCTCAATQATAAAAA J2Q 

L78163 TG(XJTAATATTQQAGTQCTOACTOQqQATGAAOQAGAAA ITCUCr 1U CAA 972 

C41657 TGCXTTAATArrTXaWOTOCT^ ~6 5 

X90693 TOGOAAATATRXlTaTGTTJU^ 983 

X90694 TOQQCAAT A X w f l iin V10CT A ACAO<X»^ 98 S 

L3 6 1 S 6 TGGGCAATATTOOTGTWTIAACAGaC^ 964 

X90692 TOQGTAATATTO(»OTTTTJUK^^^ 970 
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L78163 TGTAATTTTOTOAA- - - TGGAOACTCGT - - - TTOGATTAGC 

U41657 TOTAATTTTGTGAA TGOAGACTCOT TTOGATTAGC 

* 3 TOCAACTTTGTTAATT CAAAAT CAGOVGAACTTGGTCTT AT 

X906 94 TGCAA C1 I 11* 1V AA C lU ' Xt/lti AACTCAAATTCTGCAGAACTAGATTTAMC 

L36 156 TGCAACTT - - - - TgTQAACTCAAATTCTGCAGAACTAC ATTTAGC 

X90692 TG TAATOCTOTOAATOGGAATTCTTC TOGATTGGC 



1007 
800 
102<i 
1035 
1005 
1005 



L7U63 
U41657 
X90693 
X90694 
L361S6 
X90692 



X30693 
X90hS4 

r.joiss 

X90692 



TAGTGT0GCGTCCAWUaATtrcTJUUtf3UUU^ 

TAGTGTCX3C0TCCAAAGATGCTAAACAAAAGCTT<JTTGCTCAATCT 

CAATGTTOCCTC AGGA0 - - ATTCATCTO - AGGAGGGTATGGTTAG - - 

CACCATAOCATCCATA3TAG- -AATCATTAQ- AGOATG3TArTGCTAGTG 
CACCATAOCATCCATACTAO- - AATCATTAG - AGGATGGAATTGCTAGTG 
TACTOTAGTCACCAA AO- - AATCATCAG - AAGA7GGAATGGCTAG CT 



AAACCAAT AA7TAATOGGQATTTTGCATGCTAGCTAGCATGT AAAGGC AAA 
AAAC CAAT AATTPJ^'rGGGGATGTCOATOCTAOCT A CGATt iTAAAGOCAAA 

CTCAATGTAAA-TG-TAO 

TAATATAAATA AATTAfl COTAA ATCCACTTATTGAA - AT CI TG 

rAATATAAATAAATTAG - CGAAAATGCACTTATTQLAA - ATCTTO 

CATTCTAAAT- -ATAAQ - CTTGOAAAATATTGAAGAGCTTCTAT 



1057 
950 
1CS6 
1082 
1052 
1049 



1107 
900 
1062 
1124 
1094 
1090 



L781S3 
J41S57 
.X*06 9 J 
X90694 
L36:l5S 
X90652 



TTAGOTTGTAAACC IC 1 1 lXjCTAGCTATATTGAAATAAACChAAGGAGTA 
TT.A^XTTTO- AAACCTCTTTGCTAGCTATATTGAAATAAACC^ 

T - - OATTOOAAOCAACTAA- - TAAATTA AGAAOC1 AVAAC T 

T- - GACTAGATGCCACTAA- -TAAAT AAOTTATAAC T 

T- -GACTAGATCCCACTAA- -TAAAT AAGTTATAAC T 

A- A TTTTQTG CATACATA - - TATOGTA1GTG - 



1157 
9-iS 
1119 
1XL7 
1127 
1118 



r,73 16 j 
'741657 
X90*93 
X90694 
L361 ?6 
X90632 



GTGTGCArcrr CAATTCOATTntX: - C ATUTACCTCTT GGAATAT 1 2 0 0 

<^GTCtlATCrrCAATTCGATITTC>r- CAIXTTA 9^8 

ATCCACATT - CAT?>TrATC TGTGAPATA GTT ATT AGATG CTT" T XJTUAGCA 1 1 € S 

AGGCACATTTCA'XtJTCACTTRAAATTTCATGOPT - GTATATOAG 1 2 p. 0 

ATOC ACA 1TT* t ATO f CACTTQAAATCCTATGCCTTGIAT^ -TT AGAOGACG 1 1 n 7 

-CATcmxTrrr^ - -ttatotttttgttatottcttca *gttg atca nei 



L78163 
U41S57 
XSC69J 
X9C694 
L36156 
X9tS?2 



* 1.200 

ATAACTATrrOAATCTC AAAAAAAAAAAAAAAA \ 03 1 

AAAATCTTTTGGATTTC ATTTGAAGTaTr "CT 1200 

' " — lioo 

TOT-TCIT C- TTO'TTATTXTACTA - - T 1200 

CX5QA - CIXiTAGAAGCTCCCTAATAAT^TlTGTOTOUVAGT 1 2 0 0 
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L78163 
U41657 
XS0693 
X90694 
L36156 
X90692 



PIOCRB 3B 

MGSMRIXWAJ^I^VFAMKAGPSVS Y - - -AQLTPTFYRETCPNLFPIVFOV 
KNSUUWAIAIXX:iV--VVIXXU,PFSSK^^ 

MNSL- - -ATSMWCVVXXVVUXn.PPSSrAQl^PTFYSKTC?TVSSXVSNV 

M _-wcV\rtJ*VVIXK^FSSDAQI^FTF^^ 

MLGLSATA- - -FCCMVFVLIOOVPPS -HAQf^PSr^STCSKU)SIVRGV 



47 
0 
48 
47 
40 
46 



L78163 
U41o57 
X90693 
X90694 
L36156 
X90692 



IPT)/3PTi;PaiGASl>!IU*HFHDCFVQGCDaSVIXNNTDTlESK 

FHDCFVQGCDOSVI*LNNTDTIESBQDALPNX 

IRSVSKXDPRKIJ^LVRLHFHDCFWQGC^^ 

iynr/SJOT>I?RMLASI,VRLHFH^ 

LTNVSXTDFRKIASLVRIJHFHDCFV^ 

LTKVSOSDPiy^UJSLIMiHFHDCfVQOaiAS I LiiJDVATTVS BQS APPNN 



97 
31 
98 
97 
90 
96 



L781f 3 
U416 5 7 
X90693 
X90694 
L36156 
X90692 



NSIRGtJWVKDIKTAVKMSCPDTVSC^rtJVIAAEIAS' 

NSIRaUTVn/NDIKTAVXHSCPDTVSC^ 

NSIJW3IJDVVNQIlCTAV^JUtf:PNT^^^ 

NS UIGLDVVNQI JQAVEVFCPNTVS CAD IUU AAQASSVU^QG P S WTVP L 
NS LRGUDfWNQ I KTA VBSACPWTVS CAT5 I LALA - Q A 3 SVLAQG P S WT/P L 
NSIRGUJVINQIKTAVKKACPNTVSCADILAI^A^^ 



147 
31 
148 
147 
139 
146 



L7 8 lo 3 GRRDSI TANRTLJVNQNl^APFFtfL^ 

U4 3 6 5 7 GRRDS LTAHRTlJUfQNLPAPFFHLTQLXAS ^AVQGLNTLDLVTLi^GGHTS 

X 9 0 6 9 3 GRIUXJLTAirQIJJCIUtn/PAPFSTTTDQLICPAF AAOGLDTTDLVAI*SCSAHTF 

X 9 0 6 9 4 GRRDGLTAKRTLANQNt*PAPFNS LDQ CJCAAFTAQGLNTTDLV.\J ..SG AHTF 

L36156 GRRDGXjTAKRTT*ANG&II*PAPFWS LDHLKLKLTAQGL I TP VLV AI*SGAHT?? 

X906S2 GRRD 3 LTAN^S LAAQHLPAPYFin-TRUCSN FDSO^ S TTD L V ALSG CiflT I 



197 
131 
133 
197 
189 
196 
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L7316 3 (jRARCSTFIKRLYNFSNTGOTDPTIJnT^ 

U416S7 GRARCS TFINRIATI?SNTCa : IH--IJrrr^ 

X90G93 arJ^CSIjn/SRLTOFSaTaSPDPTLKTI^^ 

X9 0 6 9 4 GK AHCA^KVS RLYH>SST05iPDPTI-KTTVI-QC ORT I C PK^Vl PGTNXiTN FD 

L3 6 1 56 GR \HCA4^FVSRIiYHPSBT\3flroPTI^rrTYI^Cl^TICPHG-*PGTNLT>!?D 

X906 92 GEOQCRFFVDRI»YN!*SWrGflN PDSTLNTTYI/QTT^AI CTNGGPG TNLiTDTjD 
#*..* * ^ . . 

L7916 3 LSTPDQFDWRYYSKIXClI^SLI*G3DQH.G F3 TPGADT I P IVWS F S SNQNTF 

U416 57 I^T?DQFDKRYY3NI*liOLNGIXQSDQKK^STPGADTIPl*^ IA- SANQNTF 

X90593 PTTPDKFDKirr^ SKZiQVKK/3IJ^3DQBL FSTSGSUT I S IVNKFATDQKAP 

X906 94 pTWDKFDCHYYflHUJVXFjQLLQSD^^ IVKXF3TDQNAP 

L.36156 PTTTOICFDIQrrrSMI^r^ja^ 

X90692 srnpzm^OTYYSmXJVOKOI^t^ 

**• *♦#**#* *♦ ...*. * 

L76163 FSWT>TVSKIJ3*OT<r7Mt30BaBIR GDSFGIASVA3-K 

U4 1 6 5 7 FSNF> VSKIKMmnOVLTODBGB X W.QCOTVH CD<i FGLASVAS - K 

X9C6*3 FBSFRAAMIlMClflC^TaWBIRlCQOIFVll - - -SKSAKLGLIKVAS - A 

X90694 FKSF*JUU«m»iaV2.Ttl^ -*V 

L361S€ F^FKAAMIXMOWiaVLTOTXOEIRXQCKFVM SNL- ABLDLAT I AS I V 
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X90692 FKKFVASKIKXGNIGVLTGSOOBIRTOCNAVN GNSSGLATWT-K 340 
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Figure 4 
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Figure 7 
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